Mark Thomas
10/9/2008 6:51:00 PM
On Oct 8, 3:53 pm, Li Chen <chen_...@yahoo.com> wrote:
> I try to get"Slang " and "A close companion or comrade." ONLY out of
> the following a webpage(part of it) with hpricot. There are so many
> javascripts there. I don't think I know path/tag for target.
There's not a whole lot of HTML structure there. If you can
definitively target the <td> with Hpricot, you can use regular
expressions to find the appropriate comments and grab the following
text.
You can get a little more specific with XPath expressions. The
following sample code (requires libxml-ruby) extracts the two values
from your sample code:
require 'xml'
html = %Q(your_html_here)
doc = XML::HTMLParser.string(html).parse
puts doc.find('//comment()[contains(.,"SUBHEAD")]/following::i/
text()').first
puts doc.find('//comment()[contains(.,"BOF_DEF")]/
following::text()').first