[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

extract relative links from html page

cs5b

10/6/2006 4:23:00 AM

Does somebody have any suggestions on how to extract relative as well
as absolute links from an html page. It seems like the URI.extract only
matches on absolute urls.
Any pointers or suggestions are appreciated.
THanks-
Christian

1 Answer

Paul Lutus

10/6/2006 5:44:00 AM

0

cs5b@yahoo.com wrote:

> Does somebody have any suggestions on how to extract relative as well
> as absolute links from an html page. It seems like the URI.extract only
> matches on absolute urls.

Roll your own, a classic piece of programming advice:

#!/usr/bin/ruby -w

data = File.read("/path/page.html")

data.scan(/src\s*=\s*\"(.*?)\"/im) { |item|
puts "src = #{item}\n"
}

data.scan(/href\s*=\s*\"(.*?)\"/im) { |item|
puts "href = #{item}\n"
}

--
Paul Lutus
http://www.ara...