Mechanize+Cheat+Sheet

code format="python"

Mechanize cheat sheet

Use mechanize when navigating through web-forms. This library efficiently manages all the form fields (including the invisible ones) and any cookies

import mechanize br = mechanize.Browser br.set_all_readonly(False) # allow everything to be written to br.set_handle_robots(False) # no robots br.set_handle_refresh(False) # can sometimes hang without this br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
 * 1) Create a browser object and give it some optional settings (mostly these need to be applied for aspx pages).

response = br.open(url) print response.read # the text of the page response1 = br.response # get the response again print response1.read # can apply lxml.html.fromstring
 * 1) Open a webpage and inspect its contents

for form in br.forms: print "Form name:", form.name print form
 * 1) List the forms that are in the page

br.select_form("form1") # works when form has a name br.form = list(br.forms)[0] # use when form is unnamed
 * 1) To go on the mechanize browser object must have a form selected

for control in br.form.controls: print control print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name])
 * 1) Iterate through the controls in the form.

control = br.form.find_control("controlname")
 * 1) Controls can be found by name

if control.type == "select": # means it's class ClientForm.SelectControl for item in control.items: print " name=%s values=%s" % (item.name, str([label.text for label in item.get_labels]))
 * 1) Having a select control tells you what values can be selected

print control.value print control # selected value is starred control.value = ["ItemName"] print control br[control.name] = ["ItemName"] # equivalent and more normal
 * 1) Because Select type controls can have multiple selections, they must be set with a list, even if it is one element.

if control.type == "text": # means it's class ClientForm.TextControl control.value = "stuff here" br["controlname"] = "stuff here" # equivalent
 * 1) Text controls can be set as a string

control.readonly = False control.disabled = True # OR disable all of them like so for control in br.form.controls: if control.type == "submit": control.disabled = True
 * 1) Controls can be set to readonly and disabled (sometimes necessary for superfluous submit buttons).

response = br.submit print response.read br.back # go back
 * 1) When your form is complete you can submit

import cgi import mechanize request = br.click for k,v in cgi.parse_qsl(request.get_data): print (k,v)
 * 1) To find out what mechanize is going to send out, try the following

request = br.click for k,v in cgi.parse_qsl(request.get_data): print (k,v) response = br.open(request) html = response.read print html
 * 1) inspect the values that are being posted back. You can verify them against what you can preview on the net tab in firebug,

request = br.click # creates the request object cj = mechanize.CookieJar br2 = mechanize.Browser br2.set_cookiejar(cj) br2.open(request)
 * 1) You can get a request object in order to use another browser (though you will probably need to share cookie jars). This helps if you are going down numerous links from the same page.

for link in br.links: print link.text, link.url
 * 1) Following links in mechanize is a hassle because you need the have the link object. Sometimes it is easier to get them all and find the link you want from the text.

request = br.click_link(link) response = br.follow_link(link) print response.geturl
 * 1) Follow link and click links is the same as submit and click

br.form.new_control('hidden', '__EVENTTARGET', {'value':}) br.form.new_control('hidden', '__EVENTARGUMENT', {'value':}) br.form.fixup code
 * 1) Hard aspx problem: In cases where the EVENTTARGET and EVENTARGUMENT are missing you will have to add them in, (though when the page is missing these controls it's because you need to set the User-agent to something the server understands)>