[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: XPath and HTML

Chad Fowler

10/13/2003 12:21:00 AM

2 Answers

Harry Ohlsen

10/13/2003 12:37:00 AM

0

Chad Fowler wrote:
> Sort of. I shouldn't have said "HTML Parser2". The right name seems to
> be ruby-htmltools. It integrates with REXML and allows you to do this:
>
> parser = HTMLTree::Parser.new(true, true)
> parser.feed(file.readlines.join)
> tree = parser.tree.html_node.as_rexml_document
> tree.elements.to_a('*/table').each do |element|
> # do something with element
> end

I take it the need for putting ruby-htmltools in the middle is that generally HTML isn't clean XML? So, I take it the tools do things like turn "<br>" int "<br/>" and stick "</p>" at the end of paragraphs, that sort of thing?

Could be very useful for a number of things!

Harry O.



David Corbin

10/13/2003 1:11:00 AM

0

On Sunday 12 October 2003 20:21, Chad Fowler wrote:
> On Mon, 13 Oct 2003, David Corbin wrote:
>
> # On Sunday 12 October 2003 17:36, Chad Fowler wrote:
> # > On Mon, 13 Oct 2003, David Corbin wrote:
> # >
> # > # Is there a library out there that let's me parse HTML and use XPath
> # > # expressions against it? What is it?
> # > #
> # > # Thanks
> # > #
> # > #
> # >
> # > REXML (http://www.germane-software.com/softw...) and
> # > HTML Parser2 (http://www.bike-nomad...)
> # >
> #
> # Are you saying you parse it with HTML Parser2, and then use the XPath
> support # out of the REXML?
> #
> Sort of. I shouldn't have said "HTML Parser2". The right name seems to
> be ruby-htmltools. It integrates with REXML and allows you to do this:
>
> parser = HTMLTree::Parser.new(true, true)
> parser.feed(file.readlines.join)
> tree = parser.tree.html_node.as_rexml_document
> tree.elements.to_a('*/table').each do |element|
> # do something with element
> end
>
> Chad

And if you get:
/usr/local/lib/site_ruby/1.6/rexml/child.rb:21:in `initialize': undefined
method `add' for #<HTMLTree::Element:0x40331a58> (NameError)
from /usr/local/lib/site_ruby/1.6/rexml/comment.rb:23:in `initialize'
from /usr/local/lib/site_ruby/1.6/html/xpath.rb:50:in `new'
from /usr/local/lib/site_ruby/1.6/html/xpath.rb:50:in
`as_rexml_document'
from /usr/local/lib/site_ruby/1.6/html/xpath.rb:36:in
`as_rexml_document'

would you attribute that to a) Bad HTML, b) library version mismatch, or c)
something else?
--
David Corbin <dcorbin@machturtle.com>