[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

find all possible container nodes of a given text

Sylvain Tenier

12/19/2006 3:25:00 AM

hi,
I've been using ruby for a few weeks and been messing with DOM-like programming
using hpricot.
I'm looking for an efficient (rubyiish) way to find the set of Node Elements
that have a text_element child containing a given string

for instance given the html code

<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is a
dog</td></tr></table></body></html>

I would like to set this query

getNodesContaining("dog")

that would return an Array with the xpath of the first and third td (since they
contain the text dog)

thanks in advance

Sylvain

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac

3 Answers

Sylvain Tenier

12/19/2006 7:31:00 AM

0

well I think I solved it

#!/usr/bin/ruby
require 'hpricot'

html = <<EOS
<html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
adog</td></tr></table></body></html>
EOS

doc = Hpricot(html)
result=[]
doc.traverse_text do |text|
text_out = text.to_s.strip
if text_out =~ /dog/
result << text.parent.xpath
end
end

thanks anyway

Sylvain


Selon Sylvain Tenier <sylvain.tenier@loria.fr>:

> hi,
> I've been using ruby for a few weeks and been messing with DOM-like
> programming
> using hpricot.
> I'm looking for an efficient (rubyiish) way to find the set of Node Elements
> that have a text_element child containing a given string
>
> for instance given the html code
>
> <html><body><table><tr><td>harr is a dog</td><td>fuu is a cat</td><td>jii is
> a
> dog</td></tr></table></body></html>
>
> I would like to set this query
>
> getNodesContaining("dog")
>
> that would return an Array with the xpath of the first and third td (since
> they
> contain the text dog)
>
> thanks in advance
>
> Sylvain
>
> --
> Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
> marche.
> -- Daniel Pennac
>



Peter Szinek

12/19/2006 10:15:00 AM

0

>> I would like to set this query
>>
>> getNodesContaining("dog")

(Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }

Cheers,
Peter

__
http://www.rubyra...


Sylvain Tenier

12/19/2006 1:31:00 PM

0

your code assumes that the text is contained in a leaf that is child of a td
node. What I'm looking for is the direct parent of any leaf containing the
string.
For instance, in your code, if I replace td by tr I get

[{elem <tr> {elem <td> {text "harr is a dog"} </td>} {elem <td> {text "fuu is a
cat"} </td>} {elem <td> {text "jii is adog"} </td>} </tr>}]

I don't want tr to be returned, since it is an ascendant, not a direct parent

sorry if I wasn't clear enough in my question

Sylvain

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac



Selon Peter Szinek <peter@rubyrailways.com>:

> >> I would like to set this query
> >>
> >> getNodesContaining("dog")
>
> (Hpricot(html)/"//td").map.reject{ |node| node.inner_text !~ /dog/ }
>
> Cheers,
> Peter
>
> __
> http://www.rubyra...
>