[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Relative/absolute url parsing

Michel ( Dagnan )

6/10/2007 10:49:00 PM

Hi!

I have to collect urls from html files, I to transform relative urls to
absolute, I have to handle for example url beginning with '../' and
'./', which is kind of ennoying :D (my head hurts because of the
debugging process)

Actually, I test every case using regexp. Maybe you can help me finding
something faster?

Thanks.

--
Posted via http://www.ruby-....

5 Answers

Alex Young

6/11/2007 7:47:00 AM

0

Michel ( Dagnan ) wrote:
> Hi!
>
> I have to collect urls from html files, I to transform relative urls to
> absolute, I have to handle for example url beginning with '../' and
> './', which is kind of ennoying :D (my head hurts because of the
> debugging process)
>
> Actually, I test every case using regexp. Maybe you can help me finding
> something faster?
If you can shoe-horn your problem into Mechanize, it's got a private
method WWW::Mechanize#to_absolute_uri, which does precisely this. Don't
know if that's of any use, but it might be worthwhile looking at how
it's implemented at least.

--
Alex

Peter Szinek

6/11/2007 8:13:00 AM

0

Hi,

> If you can shoe-horn your problem into Mechanize, it's got a private
> method WWW::Mechanize#to_absolute_uri, which does precisely this. Don't
> know if that's of any use, but it might be worthwhile looking at how
> it's implemented at least.

+1 for Alex's solution.

I have tried to implement this in scRUBYt! (I did not know Mechanize's
to_absolute_uri back then) and, well, failed (fortunately I discovered
it in Mechanize since then). My solution worked for 99% of the cases,
but the rest was totally PITA to hunt down. I believe Aaron and the
mechanize community already did this, so why reinvent the wheel?
Believe me, if you can shoe-horn it into Mechanize as Alex suggested, do
it - it will save you lot of time, nerves, headaches, money, whatnot.

Cheers,
Peter
_
http://www.rubyra... :: Ruby and Web2.0 blog
http://s... :: Ruby web scraping framework
http://rubykitch... :: The indexed archive of all things Ruby.




Alex Fenton

6/11/2007 8:57:00 AM

0

Michel ( Dagnan ) wrote:
> Hi!
>
> I have to collect urls from html files, I to transform relative urls to
> absolute, I have to handle for example url beginning with '../' and
> './', which is kind of ennoying :D (my head hurts because of the
> debugging process)

Not sure, but perhaps the standard library will do what you want?

Assuming that this_page is the URL of the page you're scraping links from and '../qux' is the link URL you want to absolutise:

SCIPIUS:~ alex$ irb
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> this_page = URI.parse('http://www.abc.org/foo/ba...)
=> #<URI::HTTP:0x18c82e URL:http://www.abc.org/foo/b...
irb(main):003:0> this_page.merge('../qux')
=> #<URI::HTTP:0x303d42 URL:http://www.abc.org/f...

alex

Michel ( Dagnan )

6/11/2007 9:08:00 PM

0

Cant find WWW::Mechanize#to_absolute_uri method (in rdoc), but you're
right, the URI lib does exactly what I want.

Ruby is so wonderful :D

Thank you for helping.

Alex Fenton wrote:
> Not sure, but perhaps the standard library will do what you want?

--
Posted via http://www.ruby-....

Alex Young

6/11/2007 9:42:00 PM

0

Michel ( Dagnan ) wrote:
> Cant find WWW::Mechanize#to_absolute_uri method (in rdoc), but you're
> right, the URI lib does exactly what I want.
It's a private method - you'll have to do something like
agent.send(:to_absolute_uri, path) to get it to work.

> Ruby is so wonderful :D
I know :-)

--
Alex