[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Hpricot and Regular expression

Dhanasekaran Vivekanandhan

12/19/2006 7:19:00 AM

Hi All,
I have a html fragment like the following

<a href="forumdisplay.php?f=131"><strong>Toon Zone News1</strong></a>
<a href="forumdisplay.php?f=132"><strong>Toon Zone News2</strong></a>
<a href="forumdisplay.php?f=133"><strong>Toon Zone News3</strong></a>
<a href="forumdisplay.php?f=abcd"><strong>Toon Zone News3</strong></a>

I want to match only the first three anchor tags, I dont want to get the
last one since the href f parameter is abcd.it is not an integer. I want
to get only if the request parameter is integer.i.e. the first three
anchor tags.

I have following code

doc = Hpricot(open("http://0.0.0.0:3000/dh/...))
fun =
doc.search("//a[@href='forumdisplay.php?f=131']//strong").inner_html
puts fun

but it will fetch the first anchor tag content only.so I think I need to
use some regular expression to match 131, 132, 133 (f parameter) values.
I dont know how to do.
any help would be appreciated.
thanks,
dhanasekaran

--
Posted via http://www.ruby-....

3 Answers

Sylvain Tenier

12/19/2006 7:48:00 AM

0

--
Rien ne peut jamais marcher si l'on songe à tout ce qu'il faut pour que ça
marche.
-- Daniel Pennac



here is a way (I uess not optimal, I'm really new here)

#!/usr/bin/ruby
require 'hpricot'

html = <<EOS
<html>
<body>
<a href="forumdisplay.php?f=131"><strong>Toon Zone News1</strong></a>
<a href="forumdisplay.php?f=132"><strong>Toon Zone News2</strong></a>
<a href="forumdisplay.php?f=133"><strong>Toon Zone News3</strong></a>
<a href="forumdisplay.php?f=abcd"><strong>Toon Zone News3</strong></a>
</body>
</html>
EOS

doc = Hpricot(html)
result=[]
#get each <a> node
doc.search("//a").each do |chaque|
#keep only those who have the correct value for attribute as set by the regexp
result << chaque.at("//strong").inner_html if chaque.get_attribute("href") =~
/f=13?/
end

Hope this helps
Sylvain


Selon Dhanasekaran Vivekanandhan <mail2sek@yahoo.com>:

> Hi All,
> I have a html fragment like the following
>
> <a href="forumdisplay.php?f=131"><strong>Toon Zone News1</strong></a>
> <a href="forumdisplay.php?f=132"><strong>Toon Zone News2</strong></a>
> <a href="forumdisplay.php?f=133"><strong>Toon Zone News3</strong></a>
> <a href="forumdisplay.php?f=abcd"><strong>Toon Zone News3</strong></a>
>
> I want to match only the first three anchor tags, I dont want to get the
> last one since the href f parameter is abcd.it is not an integer. I want
> to get only if the request parameter is integer.i.e. the first three
> anchor tags.
>
> I have following code
>
> doc = Hpricot(open("http://0.0.0.0:3000/dh/...))
> fun =
> doc.search("//a[@href='forumdisplay.php?f=131']//strong").inner_html
> puts fun
>
> but it will fetch the first anchor tag content only.so I think I need to
> use some regular expression to match 131, 132, 133 (f parameter) values.
> I dont know how to do.
> any help would be appreciated.
> thanks,
> dhanasekaran
>
> --
> Posted via http://www.ruby-....
>



Simon Strandgaard

12/19/2006 8:17:00 AM

0

On 12/19/06, Sylvain Tenier <sylvain.tenier@loria.fr> wrote:
[snip]
> #keep only those who have the correct value for attribute as set by the regexp
> result << chaque.at("//strong").inner_html if chaque.get_attribute("href") =~
> /f=13?/
> end
[snip]

for matching f=integer.. then use this regexp

/\bf=\d+\b/


--
Simon Strandgaard

Peter Szinek

12/19/2006 9:17:00 AM

0

Dhanasekaran Vivekanandhan wrote:
> Hi All,
> I have a html fragment like the following
>
> <a href="forumdisplay.php?f=131"><strong>Toon Zone News1</strong></a>
> <a href="forumdisplay.php?f=132"><strong>Toon Zone News2</strong></a>
> <a href="forumdisplay.php?f=133"><strong>Toon Zone News3</strong></a>
> <a href="forumdisplay.php?f=abcd"><strong>Toon Zone News3</strong></a>
>
> I want to match only the first three anchor tags, I dont want to get the
> last one since the href f parameter is abcd.it is not an integer. I want
> to get only if the request parameter is integer.i.e. the first three
> anchor tags.

Try

result = (Hpricot(html)/"a[@href]").map.reject { |elem|
elem.attributes['href'] !~ /=\d+$/ }

HTH,
Peter
__
http://www.rubyra...