Chris Gehlker
8/1/2006 2:42:00 AM
On Jul 31, 2006, at 6:17 AM, Chris Gehlker wrote:
> I'm trying to use Hpricot to clean up the text in a big site full
> of old-style HTML. I'm just trying to do things like replacing
> literal quote characters with <q> and </q>. I'm hampered by the
> fact that my understanding of the HTML DOM comes from reading one
> web site yesterday and I don't know any javascript. Nonetheless, it
> seems that Hpricot should be able to easily give me all the text in
> the <body> element of each page because it has a traverse_text()
> method. The problem seems to be that if I apply it to a whole page,
> I get the text in the <head> element and all the methods for
> selecting seem to return an element, not a tree.
>
> There is a get_subnode method but it doesn't seem to work as expected.
Nevermind,
The reason get_subnode gives:
...hpricot/traverse.rb:23:in `get_subnode': undefined method
`get_subnode_internal' for #<Hpricot::Doc:0x5c182c>
is because Why literally hasn't written get_subnode_internal yet.
maybe I'll try to write it when/if i get some time.
--
For blocks are better cleft with wedges,
Than tools of sharp or subtle edges,
And dullest nonsense has been found
By some to be the most profound.
-Samuel Butler,