Erik Hollensbe
6/1/2007 6:11:00 AM
On 2007-05-31 02:36:57 -0700, "Richard Conroy" <richard.conroy@gmail.com> said:
> On 5/31/07, Dick Davies <rasputnik@gmail.com> wrote:
>> Hpricot is a good starting point.
>
> Yeah Hpricot is good, but in general the quality of the Ruby web scraping
> choices is pretty impressive. There are variants that are just built on top
> of Hpricot but provide an even simpler API.
>
> However your second problem is a bit trickier, where you encounter
> alternate encodings. To do any kind of real work with multiple code
> pages you want to be converting it to unicode (UTF-8) at fetch time.
>
I've had great success with this. Just make sure you're using a later
version of Ruby 1.8.5+ (that includes the NKF library) and you should
be fine.