Asp Forum - Hpricot and path of an elememt

Li Chen

8/10/2008 6:37:00 PM

Hi all,

I use hpricot to load a page. Then I try to find the path for an
element "font"(<font face="courier" color="black">) in the page. Here is
the tutorial
(http://code.whytheluckystiff.net/hpricot/wiki/Hpr...):

doc.at("#header").xpath
#=> "//div[@id='header']"

here is my code:
puts doc.at("#font").xpath

When I run the code Ruby complains undefined method for xpath. I wonder
if I have problem understanding the tutorial.

Thanks,

Li
--
Posted via http://www.ruby-....

2 Answers

David Masover

8/10/2008 7:03:00 PM

On Sunday 10 August 2008 13:36:42 Li Chen wrote:

> I use hpricot to load a page. Then I try to find the path for an
> element "font"(<font face="courier" color="black">) in the page.

So, you probably want:

(doc / 'font')

> doc.at("#header").xpath
> #=> "//div[@id='header']"

Right, that's searching for a tag that looks like this: <div id="header">

> here is my code:
> puts doc.at("#font").xpath

And that's searching for a tag that looks like this: <div id="font">

If you're following that example, you probably want:

puts doc.at('font').xpath

Now, first question: Why do you need the xpath? Usually, the idea is to try to
find that element, and then do something with it. So, for example:

# To return all text:
(doc / 'font').text

# To loop over each font element:
(doc / 'font').each { |tag|
puts tag.inner_text
}

Second question: Why is there a font tag on this page? If you had any hand in
creating the page, shame on you -- go learn some CSS.

In fact, go learn some CSS anyway. Hpricot supports both CSS selectors and
XPath, and it's usually much easier to use the selectors. Years later, I
still remember, roughly, how selectors work -- but only a few months later,
I've almost completely forgotten XPath.

There are things XPath can do that selectors can't. But until you encounter
them, XPath is overkill.

Li Chen

8/11/2008 12:59:00 PM

David Masover wrote:
> Now, first question: Why do you need the xpath? Usually, the idea is to
> try to
> find that element, and then do something with it. So, for example:

> # To return all text:
> (doc / 'font').text
>
> # To loop over each font element:
> (doc / 'font').each { |tag|
> puts tag.inner_text
> }

I need to extract text within this tag. I follow you code and I find
1) (doc/'font').text and (doc/'font').html return the same results
2) when I run (doc / 'font').each { |tag| puts tag.inner_text}
Ruby complains it:
undefined method `inner_text' for #<Hpricot::Elem:0x2e9f9c4>
(NoMethodError)

so I change it to tag.inner_html and it works. I check the document
about hpricot and find the methode #inner_text is there. But I cannot
figure out why Ruby complains about it.

> Second question: Why is there a font tag on this page? If you had any
> hand in
> creating the page, shame on you -- go learn some CSS.

I am a newbie on HTML and website development. If you want to know why
there is a font tag in the page, please check this out:
http://www.ensembl.org/Homo_sapiens/exonview?db=core;transcript=ENST0...

What I try to do is to extract some info I am interested from this
page. I have no idea why they put this tag and that tag there. I don't
think it is my priority to know somany whys now. I am more concerned
about letting the job done.

Anyway thank very much for the tips.

Li
--
Posted via http://www.ruby-....

comp.lang.ruby

Hpricot and path of an elememt

Li Chen

David Masover

Li Chen

x Login to ForumsZone