Asp Forum - regex problem - comp.lang.ruby

K. R.

11/27/2007 4:28:00 PM

hi @all

I would like to scan a string of html-tags. I need it to take out all
links (a-tags) in the string, but I become only the last one. What is
wrong? See the code below...

response = '<a href="hello1.html">test1</a> - <a
href="hello2.html">test2</a>'
response.scan(/<a.*href="(.*?)"/) do |line|
puts line
end

thanks for helping!
--
Posted via http://www.ruby-....

4 Answers

flazzarino

11/27/2007 4:41:00 PM

the first kleene star might need to be non greedy? in other words stop
at the first href consumed, not the last.
/<a.*?href="(.*?)"/

On Nov 27, 11:28 am, "K. R." <m...@palstek.ch> wrote:
> hi @all
>
> I would like to scan a string of html-tags. I need it to take out all
> links (a-tags) in the string, but I become only the last one. What is
> wrong? See the code below...
>
> response = '<a href="hello1.html">test1</a> - <a
> href="hello2.html">test2</a>'
> response.scan(/<a.*href="(.*?)"/) do |line|
> puts line
> end
>
> thanks for helping!
> --
> Posted viahttp://www.ruby-....

Christian von Kleist

11/27/2007 5:01:00 PM

On Nov 27, 2007 11:28 AM, K. R. <mcse@palstek.ch> wrote:
> hi @all
>
> I would like to scan a string of html-tags. I need it to take out all
> links (a-tags) in the string, but I become only the last one. What is
> wrong? See the code below...
>
> response = '<a href="hello1.html">test1</a> - <a
> href="hello2.html">test2</a>'
> response.scan(/<a.*href="(.*?)"/) do |line|
> puts line
> end
>
> thanks for helping!
> --
> Posted via http://www.ruby-....
>
>

Franco is right. You could fix it by doing "a.*?href". However, I
would change "a.*href" to "a\s+href" since you're looking for any
amount of whitespace after the "a" and before the "href".

response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
response.scan(/<a\s+href="(.*?)"/s) do |line|
puts line
end

flazzarino

12/1/2007 4:33:00 PM

On Nov 27, 12:00 pm, Christian von Kleist <cvonkle...@gmail.com>
wrote:
> On Nov 27, 2007 11:28 AM, K. R. <m...@palstek.ch> wrote:
>
>
>
> > hi @all
>
> > I would like to scan a string of html-tags. I need it to take out all
> > links (a-tags) in the string, but I become only the last one. What is
> > wrong? See the code below...
>
> > response = '<a href="hello1.html">test1</a> - <a
> > href="hello2.html">test2</a>'
> > response.scan(/<a.*href="(.*?)"/) do |line|
> > puts line
> > end
but what if href is not the first attribute of <a/>?
>
> > thanks for helping!
> > --
> > Posted viahttp://www.ruby-....
>
> Franco is right. You could fix it by doing "a.*?href". However, I
> would change "a.*href" to "a\s+href" since you're looking for any
> amount of whitespace after the "a" and before the "href".
>
> response = '<a href="hello1.html">test1</a> - <a href="hello2.html">test2</a>'
> response.scan(/<a\s+href="(.*?)"/s) do |line|
> puts line
> end

K. R.

12/2/2007 1:06:00 PM

>> response.scan(/<a.*href="(.*?)"/) do |line|
> but what if href is not the first attribute of <a/>?

Regardless which order has the attributes, because you can have any
sequence (.*) between the <a tag and href.
--
Posted via http://www.ruby-....

comp.lang.ruby

regex problem

K. R.

flazzarino

Christian von Kleist

flazzarino

K. R.

x Login to ForumsZone