   [1]SourceForge.net Logo

                                   mechanize

   Stateful programmatic web browsing in Python, after Andy Lester's Perl
   module [2]WWW::Mechanize .
     * mechanize.Browser is a subclass of mechanize.UserAgent, which is,
       in turn, a subclass of urllib2.OpenerDirector
       (ClientCookie.OpenerDirector for pre-2.4 versions of Python), so
       any URL can be opened, not just http:. mechanize.UserAgent offers
       easy dynamic configuration of user-agent features like protocol,
       cookie, redirection and robots.txt handling, without having to
       make a new OpenerDirector each time, eg. by calling build_opener()
       (it's not stable yet, though).
     * Easy HTML form filling, using [3]ClientForm interface.
     * Convenient link parsing and following.
     * Browser history (.back() and .reload() methods).
     * The Referer HTTP header is added properly (optional).
     * Automatic observance of [4]robots.txt.

   An example:
import re
from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
response = br.follow_link(text_regex=re.compile(r"cheese\s*shop"), nr=1)
assert br.viewing_html()
print br.title()
print response.geturl()
print response.info()  # headers
print response.read()  # body
response.close()

br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__
)
response2 = br.submit()  # submit current form

response3 = br.back()  # back to cheese shop
# the history mechanism uses cached requests and responses
assert response3 is response
# we can still use the response, even though we closed it:
response3.seek(0)
response3.read()
response4 = br.reload()
assert response4 is not response3

for form in br.forms():
    print form
# .links() optionally accepts the keyword args of .follow_/.find_link()
for link in br.links(url_regex=re.compile("python.org")):
    print link
    br.follow_link(link)  # takes EITHER Link instance OR keyword args
    br.back()

   You may control the browser's policy by using the methods of
   mechanize.Browser's base class, mechanize.UserAgent. For example:
br = Browser()
# Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML).
br.set_handle_equiv(False)
# Ignore robots.txt.  Do not do this without thought and consideration.
br.set_handle_robots(False)
# Don't handle cookies
br.set_cookiejar()
# Supply your own ClientCookie.CookieJar (NOTE: cookie handling is ON by
# default: no need to do this unless you have some reason to use a
# particular cookiejar)
br.set_cookiejar(cj)
# Print information about HTTP redirects and Refreshes.
br.set_debug_redirects(True)
# Print HTTP response bodies (ie. the HTML, most of the time).
br.set_debug_responses(True)
# Print HTTP headers.
br.set_debug_http(True)

   Full documentation is in the docstrings.

   Thanks to Ian Bicking, for persuading me that a UserAgent class would
   be useful.

Todo

     * Fix .response() method (each call should return independent
       pointer to same data). Want to be able to clone responses, too, so
       can process HTML. Needs some careful thought: want to clean up the
       multiple layers of response objects in ClientCookie and the
       standard library.
     * Stabilise mechanize.UserAgent.
     * Test with non-http URLs.
     * Remove dependency on pullparser: it's broken .
     * History cache expiration.
     * Add Browser.load_response() method.
     * Add Browser.form_as_string() and Browser.__str__() methods.
     * Combine ClientForm, ClientCookie and mechanize in a single
       download.
     * Would be nice to add an implementation of ClientForm interface
       built on something like BeautifulSoup, which would allow easy
       "escape" to the lower-level BeautifulSoup API in cases where the
       higher-level mechanize.Browser / ClientForm API is not sufficient.
       (DOMForm is similar: implementation of ClientForm interface on top
       of HTML DOM, but it's buggy and unmaintained, and the DOM is not
       as nice an API as BeautifulSoup).
     * Add some utilities useful for testing (eg. fetch images and
       stylesheets in page, easy assertion of things like: cookies sent
       by server, redirections, HTTP error codes etc.).
     * Do auth and proxies properly (ClientCookie probably needs some
       work here, too -- and maybe urllib2 also). Need to configure local
       squid and apache, yawn...

Download

   All documentation (including this web page) is included in the
   distribution.

   This is an alpha release: interfaces may change, and there will be
   bugs.

   Development release.
     * [5]mechanize-0.0.11a.tar.gz
     * [6]mechanize-0_0_11a.zip
     * [7]Change Log (included in distribution)
     * [8]Older versions.

   For installation instructions, see the INSTALL file included in the
   distribution.

Subversion

   The [9]Subversion (SVN) trunk is
   [10]http://codespeak.net/svn/wwwsearch/mechanize/trunk, so to check
   out the source:
svn co http://codespeak.net/svn/wwwsearch/mechanize/trunk mechanize

See also

   Richard Jones' [11]webunit (this is not the same as Steven Purcell's
   [12]code of the same name). webunit and mechanize are quite similar.
   On the minus side, webunit is missing things like browser history,
   high-level forms and links handling, thorough cookie handling, refresh
   redirection, adding of the Referer header, observance of robots.txt
   and easy extensibility. On the plus side, webunit has a bunch of
   utility functions bound up in its WebFetcher class, which look useful
   for writing tests (though they'd be easy to duplicate using
   mechanize). In general, webunit has more of a frameworky emphasis,
   with aims limited to writing tests, where mechanize and the modules it
   depends on try hard to be general-purpose libraries.

   There are many related links in the [13]General FAQ page, too.

FAQs

     * Which version of Python do I need?
       2.2 or above.
     * What else do I need?
       [14]ClientCookie, [15]ClientForm and [16]pullparser.
       The versions of those required modules are listed in the setup.py
       for mechanize (included with the download). The dependencies are
       automatically fetched by easy_install when you run python setup.py
       install.
     * Which license?
       The [17]BSD license (included in distribution).

   I prefer questions and comments to be sent to the [18]mailing list
   rather than direct to me.

   [19]John J. Lee, November 2005.
     _________________________________________________________________

   [20]Home
   [21]General FAQs
   mechanize
   [22]pullparser
   [23]ClientCookie
   [24]ClientCookie docs
   [25]ClientForm
   [26]DOMForm
   [27]python-spidermonkey
   [28]ClientTable
   [29]1.5.2 urllib2.py
   [30]1.5.2 urllib.py
   [31]Download
   [32]FAQs

References

   1. http://sourceforge.net/
   2. http://search.cpan.org/dist/WWW-Mechanize/
   3. http://wwwsearch.sourceforge.net/ClientForm/
   4. http://www.robotstxt.org/wc/norobots.html
   5. http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0.0.11a.tar.gz
   6. http://wwwsearch.sourceforge.net/mechanize/src/mechanize-0_0_11a.zip
   7. http://wwwsearch.sourceforge.net/mechanize/src/ChangeLog.txt
   8. http://wwwsearch.sourceforge.net/mechanize/src/
   9. http://subversion.tigris.org/
  10. http://codespeak.net/svn/wwwsearch/mechanize/trunk#egg=mechanize-dev
  11. http://mechanicalcat.net/tech/webunit/
  12. http://webunit.sourceforge.net/
  13. http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
  14. http://wwwsearch.sourceforge.net/ClientCookie/
  15. http://wwwsearch.sourceforge.net/ClientForm/
  16. http://wwwsearch.sourceforge.net/pullparser/
  17. http://www.opensource.org/licenses/bsd-license.php
  18. http://lists.sourceforge.net/lists/listinfo/wwwsearch-general
  19. mailto:jjl@pobox.com
  20. http://wwwsearch.sourceforge.net/
  21. http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
  22. http://wwwsearch.sourceforge.net/pullparser/
  23. http://wwwsearch.sourceforge.net/ClientCookie/
  24. http://wwwsearch.sourceforge.net/ClientCookie/doc.html
  25. http://wwwsearch.sourceforge.net/ClientForm/
  26. http://wwwsearch.sourceforge.net/DOMForm/
  27. http://wwwsearch.sourceforge.net/python-spidermonkey/
  28. http://wwwsearch.sourceforge.net/ClientTable/
  29. http://wwwsearch.sourceforge.net/bits/urllib2_152.py
  30. http://wwwsearch.sourceforge.net/bits/urllib_152.py
  31. http://wwwsearch.sourceforge.net/mechanize/#download
  32. http://wwwsearch.sourceforge.net/mechanize/#faq
