[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

URI class bug?

Morgan Cheng

6/5/2007 6:31:00 AM

I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.

uri = "http://dr_gabriele.podomatic.com/...
2006-08-03T15_09_59-07_00.m4v"
URI.parse(uri)

Is there any way to work around that?
thanks

6 Answers

Jano Svitok

6/5/2007 7:01:00 AM

0

On 6/5/07, Morgan Cheng <morgan.chengmo@gmail.com> wrote:
> I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.
>
> uri = "http://dr_gabriele.podomatic.com/...
> 2006-08-03T15_09_59-07_00.m4v"
> URI.parse(uri)
>
> Is there any way to work around that?
> thanks

It seems underscores are not allowed in host part of an URI. So it's
not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it,
you can open the class and redefine some of the methods and/or
manually edit URI sources.

Morgan Cheng

6/5/2007 7:38:00 AM

0

On Jun 5, 3:00 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
>
> > I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.
>
> > uri = "http://dr_gabriele.podomatic.com/...
> > 2006-08-03T15_09_59-07_00.m4v"
> > URI.parse(uri)
>
> > Is there any way to work around that?
> > thanks
>
> It seems underscores are not allowed in host part of an URI. So it's
> not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it,
> you can open the class and redefine some of the methods and/or
> manually edit URI sources.


In RFC 2396, "_" is taken as "Unreserved Characters".
Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being
used
in a context that does not allow the unescaped character to appear.

However, URI.escape doesn't escape "_".

require 'URI'
original_uri = "http://dr_gabriele.podomatic.com/..."
uri = URI.escape(original_uri)
puts uri == original_uri


Jano Svitok

6/5/2007 9:24:00 AM

0

On 6/5/07, Morgan Cheng <morgan.chengmo@gmail.com> wrote:
> On Jun 5, 3:00 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> > On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
> >
> > > I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.
> >
> > > uri = "http://dr_gabriele.podomatic.com/...
> > > 2006-08-03T15_09_59-07_00.m4v"
> > > URI.parse(uri)
> >
> > > Is there any way to work around that?
> > > thanks
> >
> > It seems underscores are not allowed in host part of an URI. So it's
> > not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it,
> > you can open the class and redefine some of the methods and/or
> > manually edit URI sources.
>
>
> In RFC 2396, "_" is taken as "Unreserved Characters".
> Unreserved characters can be escaped without changing the semantics
> of the URI, but this should not be done unless the URI is being
> used
> in a context that does not allow the unescaped character to appear.
>
> However, URI.escape doesn't escape "_".
>
> require 'URI'
> original_uri = "http://dr_gabriele.podomatic.com/..."
> uri = URI.escape(original_uri)
> puts uri == original_uri

I'm no expert on DNS, this is what I have found in appendix A:

host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit

alphanum = alpha | digit
alpha = lowalpha | upalpha

lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"

There's no "_" there. YMMV ;-)

Morgan Cheng

6/5/2007 9:50:00 AM

0

On Jun 5, 5:24 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
>
>
>
> > On Jun 5, 3:00 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> > > On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
>
> > > > I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.
>
> > > > uri = "http://dr_gabriele.podomatic.com/...
> > > > 2006-08-03T15_09_59-07_00.m4v"
> > > > URI.parse(uri)
>
> > > > Is there any way to work around that?
> > > > thanks
>
> > > It seems underscores are not allowed in host part of an URI. So it's
> > > not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it,
> > > you can open the class and redefine some of the methods and/or
> > > manually edit URI sources.
>
> > In RFC 2396, "_" is taken as "Unreserved Characters".
> > Unreserved characters can be escaped without changing the semantics
> > of the URI, but this should not be done unless the URI is being
> > used
> > in a context that does not allow the unescaped character to appear.
>
> > However, URI.escape doesn't escape "_".
>
> > require 'URI'
> > original_uri = "http://dr_gabriele.podomatic.com/..."
> > uri = URI.escape(original_uri)
> > puts uri == original_uri
>
> I'm no expert on DNS, this is what I have found in appendix A:
>
> host = hostname | IPv4address
> hostname = *( domainlabel "." ) toplabel [ "." ]
> domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
> toplabel = alpha | alpha *( alphanum | "-" ) alphanum
> IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
>
> alphanum = alpha | digit
> alpha = lowalpha | upalpha
>
> lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
> "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
> "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
> upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
> "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
> "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
> digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
> "8" | "9"
>
> There's no "_" there. YMMV ;-)

Thanks a lot for your help.

I am just wandering, internet is a wild world. Wierd non-standard
stuff is all around. The non-standard host name is a example. Popular
browser can handle these URLs well.
Perhaps ruby should be more strong to survive better in such wild
world :-)





Alex Young

6/5/2007 11:13:00 AM

0

Morgan Cheng wrote:
> On Jun 5, 5:24 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
<snip>
>> There's no "_" there. YMMV ;-)
>
> Thanks a lot for your help.
>
> I am just wandering, internet is a wild world. Wierd non-standard
> stuff is all around. The non-standard host name is a example. Popular
> browser can handle these URLs well.
> Perhaps ruby should be more strong to survive better in such wild
> world :-)
There was mention a few days ago of bringing the URI class up to a more
recent RFC compliance (3986, I think). Would that help in this instance?

--
Alex

Jano Svitok

6/5/2007 11:13:00 AM

0

On 6/5/07, Morgan Cheng <morgan.chengmo@gmail.com> wrote:
> On Jun 5, 5:24 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> > On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
> >
> >
> >
> > > On Jun 5, 3:00 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote:
> > > > On 6/5/07, Morgan Cheng <morgan.chen...@gmail.com> wrote:
> >
> > > > > I am using Ruby 1.8.6. I found URI cannot parse URI with "_" is host.
> >
> > > > > uri = "http://dr_gabriele.podomatic.com/...
> > > > > 2006-08-03T15_09_59-07_00.m4v"
> > > > > URI.parse(uri)
> >
> > > > > Is there any way to work around that?
> > > > > thanks
> >
> > > > It seems underscores are not allowed in host part of an URI. So it's
> > > > not a bug. See RFC 2396 (URI), and 1035 (DNS). If you really want it,
> > > > you can open the class and redefine some of the methods and/or
> > > > manually edit URI sources.
> >
> > > In RFC 2396, "_" is taken as "Unreserved Characters".
> > > Unreserved characters can be escaped without changing the semantics
> > > of the URI, but this should not be done unless the URI is being
> > > used
> > > in a context that does not allow the unescaped character to appear.
> >
> > > However, URI.escape doesn't escape "_".
> >
> > > require 'URI'
> > > original_uri = "http://dr_gabriele.podomatic.com/..."
> > > uri = URI.escape(original_uri)
> > > puts uri == original_uri
> >
> > I'm no expert on DNS, this is what I have found in appendix A:
> >
> > host = hostname | IPv4address
> > hostname = *( domainlabel "." ) toplabel [ "." ]
> > domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
> > toplabel = alpha | alpha *( alphanum | "-" ) alphanum
> > IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
> >
> > alphanum = alpha | digit
> > alpha = lowalpha | upalpha
> >
> > lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
> > "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
> > "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
> > upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
> > "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
> > "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
> > digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
> > "8" | "9"
> >
> > There's no "_" there. YMMV ;-)
>
> Thanks a lot for your help.
>
> I am just wandering, internet is a wild world. Wierd non-standard
> stuff is all around. The non-standard host name is a example. Popular
> browser can handle these URLs well.
> Perhaps ruby should be more strong to survive better in such wild
> world :-)

If you want to really use underscores, modify lib/1.8/uri/common.rb:
add the following after line HOSTNAME=... and comment out (prefix with
#) the original HOSTNAME line.

ALPHA_ = "a-zA-Z_"
ALNUM_ = "#{ALPHA_}\\d"
DOMLABEL_ = "(?:[#{ALNUM_}](?:[-#{ALNUM_}]*[#{ALNUM_}])?)"
TOPLABEL_ = "(?:[#{ALPHA_}](?:[-#{ALNUM_}]*[#{ALNUM_}])?)"
HOSTNAME = "(?:#{DOMLABEL_}\\.)*#{TOPLABEL_}\\.?"

J.