Asp Forum - How can one get the Hpricot DOM document from Mechanize?

Randy R

9/13/2008 7:07:00 PM

I was wondering if there were some way of getting the Hpricot DOM (for
lack of a better term) from a Mechanize page. For example:

agent = WWW:Mechanize.new
page = agent.get(http://www.w...)

# I am currently doing this
doc = Hpricot(page.body)

# I would like to do this
doc = page.get_hpricot_dom

The idea is that since Mechanize apparently uses Hpricot and it's surely
using it to parse the HTML begotten from the agent.get method, it would be
nice if I didn't have to repeat that work.
Is there a way to get this Hpricot document? ...or am I just totally
wrong about how Mechanize uses Hpricot?
Thank you...

3 Answers

Lex Williams

9/13/2008 8:07:00 PM

perhaps it's only me , but would you please detail what is it you want
to accomplish? maybe , with an example perhaps ?
--
Posted via http://www.ruby-....

Matthias Reitinger

9/13/2008 8:16:00 PM

Just Another Victim wrote:
> # I would like to do this
> doc = page.get_hpricot_dom

Try page.parser or page.root (they're eqivalent).

Regards,
Matthias
--
Posted via http://www.ruby-....

Aaron Patterson

9/18/2008 4:27:00 AM

On Sun, Sep 14, 2008 at 04:03:04AM +0900, Just Another Victim of the Ambient Morality wrote:
> I was wondering if there were some way of getting the Hpricot DOM (for
> lack of a better term) from a Mechanize page. For example:
>
>
> agent = WWW:Mechanize.new
> page = agent.get(http://www.w...)
>
> # I am currently doing this
> doc = Hpricot(page.body)
>
> # I would like to do this
> doc = page.get_hpricot_dom
>
>
> The idea is that since Mechanize apparently uses Hpricot and it's surely
> using it to parse the HTML begotten from the agent.get method, it would be
> nice if I didn't have to repeat that work.
> Is there a way to get this Hpricot document? ...or am I just totally
> wrong about how Mechanize uses Hpricot?

You can get at the Hpricot document by using the "parser" accessor on
WWW::Mechanize::Page. Page also responds to "search", "/", and "at",
which just delegate to the Hpricot document.

So you can just do:

(agent.get('http://tenderlovemakin...)/'tr').each do |tr|
...
end

--
Aaron Patterson
http://tenderlovem...

comp.lang.ruby

How can one get the Hpricot DOM document from Mechanize?

Randy R

Lex Williams

Matthias Reitinger

Aaron Patterson

x Login to ForumsZone