Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Mechanize and XPath
Ruby Newbie
10/15/2008 5:08:00 PM
Is there a way to select links in a scraped mechanize page using XPath
selectors ?
For example...all links on the second TABLE on the page.
I know it is possible with hpricot but i need the links to be used by
mechanize.
--
Posted via
http://www.ruby-...
.
2 Answers
Peter Szinek
10/15/2008 5:44:00 PM
0
[Note: parts of this message were removed to make it a legal post.]
On 2008.10.15., at 19:08, Ruby Newbie wrote:
>
> Is there a way to select links in a scraped mechanize page using XPath
> selectors ?
>
> For example...all links on the second TABLE on the page.
>
>
> I know it is possible with hpricot but i need the links to be used by
> mechanize.
From the Mechanize guide (
http://mechanize.rubyforge.org/mechanize/files/GUID...
):
Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('
http://someurl...
).search("//p[@class='posted']")
HTH,
Peter
Patrick L.
2/18/2009 10:33:00 PM
0
Peter Szinek wrote:
> On 2008.10.15., at 19:08, Ruby Newbie wrote:
>
>>
>> Is there a way to select links in a scraped mechanize page using XPath
>> selectors ?
>>
>> For example...all links on the second TABLE on the page.
>>
>>
>> I know it is possible with hpricot but i need the links to be used by
>> mechanize.
>
> From the Mechanize guide
> (
http://mechanize.rubyforge.org/...
files/GUID...
> ):
>
> Mechanize uses hpricot to parse html. What does this mean for you? You
> can treat a mechanize page like an hpricot object. After you have used
> Mechanize to navigate to the page that you need to scrape, then scrape
> it using hpricot methods:
> agent.get('
http://someurl...
).search("//p[@class='posted']")
> HTH,
> Peter
Wait a minute, it says the total opposite on the Mechanize page. But it
definately explains why it's not being friendly with nokogiri...
http://mechanize.rubyforge.org/...
Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:
agent.get('
http://someurl...
).search(".//p[@class='posted']"
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Mechanize and XPath
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password