[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regular expression question

John Doe

12/21/2005 6:03:00 PM

Why does the following code:

line = " rows = 10 cols = 1 occupied cells = 0"
line =~ /.*(\d+).*(\d+).*(\d+)/
print(" scanned rows = ",$1," cols = ",$2," occ = ",$3,"\n")

print this when it runs:

scanned rows = 0 cols = 1 occ = 0

(notice rows is zero!)

What have I done wrong?


4 Answers

Ross Bamford

12/21/2005 6:36:00 PM

0

On Wed, 21 Dec 2005 18:03:13 -0000, DeZo <nobody@nowhere.com> wrote:

> Why does the following code:
>
> line = " rows = 10 cols = 1 occupied cells = 0"
> line =~ /.*(\d+).*(\d+).*(\d+)/
> print(" scanned rows = ",$1," cols = ",$2," occ = ",$3,"\n")
>
> print this when it runs:
>
> scanned rows = 0 cols = 1 occ = 0
>
> (notice rows is zero!)
>
> What have I done wrong?

Your problem is that '*' is greedy so it'll match as many 'any characters'
as it can. Try

/.*?(\d+).*?(\d+).*?(\d+)/

Usually I'd tend to use something like:

/[^\d]*(\d+)[^\d]*(\d+)[^\d]*(\d+)/

instead, to make it explicit I want not digits, followed by digits, etc...

Hope that helps,
Ross

--
Ross Bamford - rosco@roscopeco.remove.co.uk

Robert Klemme

12/22/2005 9:42:00 AM

0

Ross Bamford wrote:
> On Wed, 21 Dec 2005 18:03:13 -0000, DeZo <nobody@nowhere.com> wrote:
>
>> Why does the following code:
>>
>> line = " rows = 10 cols = 1 occupied cells = 0"
>> line =~ /.*(\d+).*(\d+).*(\d+)/
>> print(" scanned rows = ",$1," cols = ",$2," occ = ",$3,"\n")
>>
>> print this when it runs:
>>
>> scanned rows = 0 cols = 1 occ = 0
>>
>> (notice rows is zero!)
>>
>> What have I done wrong?
>
> Your problem is that '*' is greedy so it'll match as many 'any
> characters' as it can. Try
>
> /.*?(\d+).*?(\d+).*?(\d+)/
>
> Usually I'd tend to use something like:
>
> /[^\d]*(\d+)[^\d]*(\d+)[^\d]*(\d+)/
>
> instead, to make it explicit I want not digits, followed by digits,
> etc...

Some other solutions with individual pros and cons:

>> line = " rows = 10 cols = 1 occupied cells = 0"
=> " rows = 10 cols = 1 occupied cells = 0"
>> line.scan(/\d+/)
=> ["10", "1", "0"]
>> line.scan(/\d+/).map {|s| s.to_i}
=> [10, 1, 0]
>> line.scan(/\w+\s*=\s*(\d+)/)
=> [["10"], ["1"], ["0"]]
>> line.scan(/\w+\s*=\s*(\d+)/).map {|m| m[0].to_i}
=> [10, 1, 0]

And explicitely matching the pattern:

>> /rows\s*=\s*(\d+)\s*cols\s*=\s*(\d+)\s*occupied cells\s*=\s*(\d+)/ =~
line and [$1, $2, $3]
=> ["10", "1", "0"]
>> /rows\s*=\s*(\d+)\s*cols\s*=\s*(\d+)\s*occupied cells\s*=\s*(\d+)/ =~
line and [$1.to_i, $2.to_i, $3.to_i]
=> [10, 1, 0]

Kind regards

robert

Ross Bamford

12/22/2005 1:38:00 PM

0

On Thu, 22 Dec 2005 09:42:00 -0000, Robert Klemme <bob.news@gmx.net> wrote:

> Ross Bamford wrote:
>> On Wed, 21 Dec 2005 18:03:13 -0000, DeZo <nobody@nowhere.com> wrote:
>>
>>> Why does the following code:
>>>
>>> line = " rows = 10 cols = 1 occupied cells = 0"
>>> line =~ /.*(\d+).*(\d+).*(\d+)/
>>> print(" scanned rows = ",$1," cols = ",$2," occ = ",$3,"\n")
>>>
>>> print this when it runs:
>>>
>>> scanned rows = 0 cols = 1 occ = 0
>>>
>>> (notice rows is zero!)
>>>
>>> What have I done wrong?
>>
>> Your problem is that '*' is greedy so it'll match as many 'any
>> characters' as it can. Try
>>
>> /.*?(\d+).*?(\d+).*?(\d+)/
>>
>> Usually I'd tend to use something like:
>>
>> /[^\d]*(\d+)[^\d]*(\d+)[^\d]*(\d+)/
>>
>> instead, to make it explicit I want not digits, followed by digits,
>> etc...
>
> Some other solutions with individual pros and cons:
>
>>> line = " rows = 10 cols = 1 occupied cells = 0"
> => " rows = 10 cols = 1 occupied cells = 0"
>>> line.scan(/\d+/)
> => ["10", "1", "0"]
>>> line.scan(/\d+/).map {|s| s.to_i}
> => [10, 1, 0]

>>> line.scan(/\w+\s*=\s*(\d+)/)
> => [["10"], ["1"], ["0"]]
>>> line.scan(/\w+\s*=\s*(\d+)/).map {|m| m[0].to_i}
> => [10, 1, 0]
>

Ahh, much better. Another KISS reminder gets it's own page (again) in my
notebook..

Thanks :)

--
Ross Bamford - rosco@roscopeco.remove.co.uk

Jim

12/22/2005 2:08:00 PM

0

I've been using this idiom recently.

>> line = " rows = 10 cols = 1 occupied cells = 0"
=> " rows = 10 cols = 1 occupied cells = 0"
>> if line[/.*?(\d+).*?(\d+).*?(\d+)/]
>> rows, cols, cells = $1.to_i, $2.to_i, $3.to_i
>> end
=> [10, 1, 0]
>> rows
=> 10
>> cols
=> 1
>> cells
=> 0