Asp Forum - Mechanize and XPath

Ruby Newbie

10/15/2008 5:08:00 PM

Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.

I know it is possible with hpricot but i need the links to be used by
mechanize.
--
Posted via http://www.ruby-....

2 Answers

Peter Szinek

10/15/2008 5:44:00 PM

[Note: parts of this message were removed to make it a legal post.]

On 2008.10.15., at 19:08, Ruby Newbie wrote:

>
> Is there a way to select links in a scraped mechanize page using XPath
> selectors ?
>
> For example...all links on the second TABLE on the page.
>
>
> I know it is possible with hpricot but i need the links to be used by
> mechanize.

From the Mechanize guide (http://mechanize.rubyforge.org/mechanize/files/GUID...
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl...).search("//p[@class='posted']")
HTH,
Peter

Patrick L.

2/18/2009 10:33:00 PM

Peter Szinek wrote:
> On 2008.10.15., at 19:08, Ruby Newbie wrote:
>
>>
>> Is there a way to select links in a scraped mechanize page using XPath
>> selectors ?
>>
>> For example...all links on the second TABLE on the page.
>>
>>
>> I know it is possible with hpricot but i need the links to be used by
>> mechanize.
>
> From the Mechanize guide
> (http://mechanize.rubyforge.org/...files/GUID...
> ):
>
> Mechanize uses hpricot to parse html. What does this mean for you? You
> can treat a mechanize page like an hpricot object. After you have used
> Mechanize to navigate to the page that you need to scrape, then scrape
> it using hpricot methods:
> agent.get('http://someurl...).search("//p[@class='posted']")
> HTH,
> Peter

Wait a minute, it says the total opposite on the Mechanize page. But it
definately explains why it's not being friendly with nokogiri...

http://mechanize.rubyforge.org/...

Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:

agent.get('http://someurl...).search(".//p[@class='posted']"
--
Posted via http://www.ruby-....

comp.lang.ruby

Mechanize and XPath

Ruby Newbie

Peter Szinek

Patrick L.

x Login to ForumsZone