[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Parsing query parameters from hyperlink

lrlebron@gmail.com

9/1/2007 5:35:00 PM

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
arrParts = link.html.split('?')
cpnum = arrParts[1].split('&amp')
cpnumparts= cpnum[0].split("=")
drugId = cpnumparts[1]
end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

thanks,

Luis

10 Answers

Robert Klemme

9/1/2007 6:59:00 PM

0

On 01.09.2007 19:34, lrlebron@gmail.com wrote:
> I am trying to parse strings like this
> <a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
> I need to get the cpnum value (555)
>
> I am using the following function
>
> def get_drugId(link)
> arrParts = link.html.split('?')
> cpnum = arrParts[1].split('&amp')
> cpnumparts= cpnum[0].split("=")
> drugId = cpnumparts[1]
> end
>
> but I imagine there is a simpler way to do this. Also, I would like
> something more flexible that would return all the query parameters (if
> there are more than one) in an array or a hash.
>
> Any ideas?

The std lib:

require 'uri'

irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
....

robert

Aaron Patterson

9/1/2007 7:15:00 PM

0

On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> On 01.09.2007 19:34, lrlebron@gmail.com wrote:
> >I am trying to parse strings like this
> ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
> >
> >I need to get the cpnum value (555)
> >
> >I am using the following function
> >
> >def get_drugId(link)
> > arrParts = link.html.split('?')
> > cpnum = arrParts[1].split('&amp')
> > cpnumparts= cpnum[0].split("=")
> > drugId = cpnumparts[1]
> > end
> >
> >but I imagine there is a simpler way to do this. Also, I would like
> >something more flexible that would return all the query parameters (if
> >there are more than one) in an array or a hash.
> >
> >Any ideas?
>
> The std lib:
>
> require 'uri'
>
> irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
> => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> irb(main):007:0> u.query
> => "dodo=1&dada=2"
> irb(main):008:0> u.query.split('&')
> => ["dodo=1", "dada=2"]
> ...

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
=> {"a"=>["b"], "b"=>["c"]}
irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
=> {"b"=>["a", "c"]}
irb(main):006:0>

--
Aaron Patterson
http://tenderlovem...

lrlebron@gmail.com

9/1/2007 7:29:00 PM

0

On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > >I am trying to parse strings like this
> > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
> > >I need to get the cpnum value (555)
>
> > >I am using the following function
>
> > >def get_drugId(link)
> > > arrParts = link.html.split('?')
> > > cpnum = arrParts[1].split('&amp')
> > > cpnumparts= cpnum[0].split("=")
> > > drugId = cpnumparts[1]
> > > end
>
> > >but I imagine there is a simpler way to do this. Also, I would like
> > >something more flexible that would return all the query parameters (if
> > >there are more than one) in an array or a hash.
>
> > >Any ideas?
>
> > The std lib:
>
> > require 'uri'
>
> > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
> > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > irb(main):007:0> u.query
> > => "dodo=1&dada=2"
> > irb(main):008:0> u.query.split('&')
> > => ["dodo=1", "dada=2"]
> > ...
>
> Query strings are allowed to use semicolons as delimeters, not to
> mention you must handle multiple values per key. I recommend using the
> CGI library with the URI library:
>
> irb(main):001:0> require 'uri'
> => true
> irb(main):002:0> require 'cgi'
> => true
> irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
> => {"b"=>["a", "c"]}
> irb(main):006:0>
>
> --
> Aaron Pattersonhttp://tenderlovema... Hide quoted text -
>
> - Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.

Aaron Patterson

9/1/2007 7:47:00 PM

0

On Sun, Sep 02, 2007 at 04:30:05AM +0900, lrlebron@gmail.com wrote:
> On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > > >I am trying to parse strings like this
> > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
> >
> > > >I need to get the cpnum value (555)
> >
> > > >I am using the following function
> >
> > > >def get_drugId(link)
> > > > arrParts = link.html.split('?')
> > > > cpnum = arrParts[1].split('&amp')
> > > > cpnumparts= cpnum[0].split("=")
> > > > drugId = cpnumparts[1]
> > > > end
> >
> > > >but I imagine there is a simpler way to do this. Also, I would like
> > > >something more flexible that would return all the query parameters (if
> > > >there are more than one) in an array or a hash.
> >
> > > >Any ideas?
> >
> > > The std lib:
> >
> > > require 'uri'
> >
> > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
> > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > > irb(main):007:0> u.query
> > > => "dodo=1&dada=2"
> > > irb(main):008:0> u.query.split('&')
> > > => ["dodo=1", "dada=2"]
> > > ...
> >
> > Query strings are allowed to use semicolons as delimeters, not to
> > mention you must handle multiple values per key. I recommend using the
> > CGI library with the URI library:
> >
> > irb(main):001:0> require 'uri'
> > => true
> > irb(main):002:0> require 'cgi'
> > => true
> > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
> > => {"b"=>["a", "c"]}
> > irb(main):006:0>
> >
> > --
> > Aaron Pattersonhttp://tenderlovema... Hide quoted text -
> >
> > - Show quoted text -
>
> This would work if the string where a proper url. But it is a
> hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.

--
Aaron Patterson
http://tenderlovem...

lrlebron@gmail.com

9/1/2007 7:54:00 PM

0

On Sep 1, 2:29 pm, "lrleb...@gmail.com" <lrleb...@gmail.com> wrote:
> On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
>
>
>
>
>
> > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > > >I am trying to parse strings like this
> > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
> > > >I need to get the cpnum value (555)
>
> > > >I am using the following function
>
> > > >def get_drugId(link)
> > > > arrParts = link.html.split('?')
> > > > cpnum = arrParts[1].split('&amp')
> > > > cpnumparts= cpnum[0].split("=")
> > > > drugId = cpnumparts[1]
> > > > end
>
> > > >but I imagine there is a simpler way to do this. Also, I would like
> > > >something more flexible that would return all the query parameters (if
> > > >there are more than one) in an array or a hash.
>
> > > >Any ideas?
>
> > > The std lib:
>
> > > require 'uri'
>
> > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
> > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > > irb(main):007:0> u.query
> > > => "dodo=1&dada=2"
> > > irb(main):008:0> u.query.split('&')
> > > => ["dodo=1", "dada=2"]
> > > ...
>
> > Query strings are allowed to use semicolons as delimeters, not to
> > mention you must handle multiple values per key. I recommend using the
> > CGI library with the URI library:
>
> > irb(main):001:0> require 'uri'
> > => true
> > irb(main):002:0> require 'cgi'
> > => true
> > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
> > => {"b"=>["a", "c"]}
> > irb(main):006:0>
>
> > --
> > Aaron Pattersonhttp://tenderlovemaking... quoted text -
>
> > - Show quoted text -
>
> This would work if the string where a proper url. But it is a
> hyperlink.- Hide quoted text -
>
> - Show quoted text -

Sorry for the second reply. I took your suggestions and came up with
the following

require 'uri'
require 'cgi'

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

def get_cpnum(link)
arrParts = link.split(' ')
CGI.parse(URI.parse(arrParts[1]).query)['cpnum']
end

puts get_cpnum(str)

Phillip Gawlowski

9/1/2007 8:51:00 PM

0

lrlebron@gmail.com wrote:
> This would work if the string where a proper url. But it is a
> hyperlink.

Your point? A hyperlink *is* a URL in the WWW context.

--
Phillip Gawlowski


lrlebron@gmail.com

9/1/2007 11:04:00 PM

0

On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:
> lrleb...@gmail.com wrote:
> > This would work if the string where a proper url. But it is a
> > hyperlink.
>
> Your point? A hyperlink *is* a URL in the WWW context.
>
> --
> Phillip Gawlowski

If you try to parse URI throws an error.

lrlebron@gmail.com

9/1/2007 11:23:00 PM

0

On Sep 1, 2:47 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> On Sun, Sep 02, 2007 at 04:30:05AM +0900, lrleb...@gmail.com wrote:
> > On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> > > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > > > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > > > >I am trying to parse strings like this
> > > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
> > > > >I need to get the cpnum value (555)
>
> > > > >I am using the following function
>
> > > > >def get_drugId(link)
> > > > > arrParts = link.html.split('?')
> > > > > cpnum = arrParts[1].split('&amp')
> > > > > cpnumparts= cpnum[0].split("=")
> > > > > drugId = cpnumparts[1]
> > > > > end
>
> > > > >but I imagine there is a simpler way to do this. Also, I would like
> > > > >something more flexible that would return all the query parameters (if
> > > > >there are more than one) in an array or a hash.
>
> > > > >Any ideas?
>
> > > > The std lib:
>
> > > > require 'uri'
>
> > > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2")
> > > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > > > irb(main):007:0> u.query
> > > > => "dodo=1&dada=2"
> > > > irb(main):008:0> u.query.split('&')
> > > > => ["dodo=1", "dada=2"]
> > > > ...
>
> > > Query strings are allowed to use semicolons as delimeters, not to
> > > mention you must handle multiple values per key. I recommend using the
> > > CGI library with the URI library:
>
> > > irb(main):001:0> require 'uri'
> > > => true
> > > irb(main):002:0> require 'cgi'
> > > => true
> > > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c').query)
> > > => {"a"=>["b"], "b"=>["c"]}
> > > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c').query)
> > > => {"a"=>["b"], "b"=>["c"]}
> > > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c').query)
> > > => {"b"=>["a", "c"]}
> > > irb(main):006:0>
>
> > > --
> > > Aaron Pattersonhttp://tenderlovemaking... quoted text -
>
> > > - Show quoted text -
>
> > This would work if the string where a proper url. But it is a
> > hyperlink.
>
> Use hpricot to extract the href, then feed it though URI and CGI.
>
> --
> Aaron Pattersonhttp://tenderlovem...

Here's what I ended up with

require 'uri'
require 'cgi'
require 'hpricot'

def get_query_value(link, key='')
doc = Hpricot(link)

if key.empty?
CGI.parse(URI.parse(doc.at("a")['href']).query)
else
CGI.parse(URI.parse(doc.at("a")['href']).query)[key]
end

end

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

p get_query_value(str)
puts get_query_value(str,'cpnum')
puts get_query_value(str,'monotype')

It allows me to ask for the complete hash or a particular key

Thanks,

Luis

Robert Klemme

9/2/2007 11:59:00 AM

0

On 02.09.2007 01:03, lrlebron@gmail.com wrote:
> On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:
>> lrleb...@gmail.com wrote:
>>> This would work if the string where a proper url. But it is a
>>> hyperlink.
>> Your point? A hyperlink *is* a URL in the WWW context.
>>
>> --
>> Phillip Gawlowski
>
> If you try to parse URI throws an error.

Does it? This works for me:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> "x=2"
irb(main):004:0> u=URI.parse('baz?x=2')
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> "x=2"

Cheers

robert

lrlebron@gmail.com

9/2/2007 1:31:00 PM

0

On Sep 2, 6:59 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 02.09.2007 01:03, lrleb...@gmail.com wrote:
>
> > On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:
> >> lrleb...@gmail.com wrote:
> >>> This would work if the string where a proper url. But it is a
> >>> hyperlink.
> >> Your point? A hyperlink *is* a URL in the WWW context.
>
> >> --
> >> Phillip Gawlowski
>
> > If you try to parse URI throws an error.
>
> Does it? This works for me:
>
> irb(main):001:0> require 'uri'
> => true
> irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
> => #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
> irb(main):003:0> u.query
> => "x=2"
> irb(main):004:0> u=URI.parse('baz?x=2')
> => #<URI::Generic:0x3ff9f15c URL:baz?x=2>
> irb(main):005:0> u.query
> => "x=2"
>
> Cheers
>
> robert

I meant if you try to parse the string
str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"
it throws an error.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad URI(is not
URI?): <a href='showmono.asp?cpnum=555&monotype=full' target='main'>
(URI::InvalidURIError)
from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
from uritest.rb:8