Dan Diebolt
1/9/2009 7:36:00 PM
[Note: parts of this message were removed to make it a legal post.]
>Any advice for me follks?
I use Hpricot extensively for various data mining tasks.
It is my repeated experience that the more difficult task
is devising a harvesting strategy which depends on the
structure of target web page. I rarely have to devise a
workaround because Hpricot does not support a selector or
some other feature. In practice, when you start parsing a
lot of web pages for information things like invalid html,
character entities, whitespace & comments, css interactions
etc become more of an issue the features of your parser.
Other may have different experience, but I find determining
a harvesting strategy more difficult than manipulating
a particular gem such as Hpricot.