[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: XPath and HTML

Chad Fowler

10/13/2003 11:15:00 AM

2 Answers

David Corbin

10/13/2003 11:36:00 PM

0

On Monday 13 October 2003 07:14, Chad Fowler wrote:
> On Mon, 13 Oct 2003, David Corbin wrote:
>
> # On Sunday 12 October 2003 20:21, Chad Fowler wrote:
> # > On Mon, 13 Oct 2003, David Corbin wrote:
> # >
> # > # On Sunday 12 October 2003 17:36, Chad Fowler wrote:
> # > # > On Mon, 13 Oct 2003, David Corbin wrote:
> # > # >
> # > # > # Is there a library out there that let's me parse HTML and use
> XPath # > # > # expressions against it? What is it?
> # > # > #
> # > # > # Thanks
> # > # > #
> # > # > #
> # > # >
> # > # > REXML (http://www.germane-software.com/softw...) and
> # > # > HTML Parser2 (http://www.bike-nomad...)
> # > # >
> # > #
> # > # Are you saying you parse it with HTML Parser2, and then use the XPath
> # > support # out of the REXML?
> # > #
> # > Sort of. I shouldn't have said "HTML Parser2". The right name seems
> to # > be ruby-htmltools. It integrates with REXML and allows you to do
> this: # >
> # > parser = HTMLTree::Parser.new(true, true)
> # > parser.feed(file.readlines.join)
> # > tree = parser.tree.html_node.as_rexml_document
> # > tree.elements.to_a('*/table').each do |element|
> # > # do something with element
> # > end
> # >
> # > Chad
> #
> # And if you get:
> # /usr/local/lib/site_ruby/1.6/rexml/child.rb:21:in `initialize': undefined
> # method `add' for #<HTMLTree::Element:0x40331a58> (NameError)
> # from /usr/local/lib/site_ruby/1.6/rexml/comment.rb:23:in
> `initialize' # from
> /usr/local/lib/site_ruby/1.6/html/xpath.rb:50:in `new' # from
> /usr/local/lib/site_ruby/1.6/html/xpath.rb:50:in
> # `as_rexml_document'
> # from /usr/local/lib/site_ruby/1.6/html/xpath.rb:36:in
> # `as_rexml_document'
> #
> # would you attribute that to a) Bad HTML, b) library version mismatch, or
> c) # something else?
> #
>
>
> Looks like a library mismatch. I haven't seen this and I can't reproduce
> it. What was the HTML you were using?
>

Something I'm trying to screen-scrape from an on-line dictionary. It's not
really well formed. If you like, I'll send it to you off-line.

It's hard to be sure, but it looks like Rexml is either 2.5.6 or 2.7.1 (I'm
leaning toward the latter) The htmltools are the latest found on the site
you cited.

> Chad

--
David Corbin <dcorbin@machturtle.com>


gabriele renzi

10/14/2003 9:49:00 AM

0

il Tue, 14 Oct 2003 08:36:00 +0900, David Corbin
<dcorbin@machturtle.com> ha scritto::

formed. If you like, I'll send it to you off-line.
>
>It's hard to be sure, but it looks like Rexml is either 2.5.6 or 2.7.1 (I'm
>leaning toward the latter)
you could try REXML::Version #->'2.5.7'
Note that Version != VERSION ;)

> The htmltools are the latest found on the site
>you cited.
>
>> Chad