Ryan Leavengood
3/29/2006 1:00:00 AM
On 3/26/06, Jeff Pritchard <jp@jeffpritchard.com> wrote:
>
> I was wondering if anyone could point me to some example code that is
> using RubyfulSoup to parse a sitemap to get links to all the pages on
> that site and request each page and grab things from it.
WWW::Mechanize makes this easy. The HTML parsing has been pretty
robust in my experience. So far I've used it to scrape my library's
web site to see when books are due and automatically renew them, as
well as log into Cingular.com and get my mobile phone minutes. The
library web-site has weird redirects and some other things that
Mechanize handles great, and the Cingular has a weird multi-step login
system that I got going as well without too much trouble.
When I needed support for check boxes in the form on the library
web-site, the author of WWW::Mechanize, Michael Neumann, added them in
less than 24 hours.
So anyhow, this is a slick library, and very useful.
Ryan