[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Possible regular expression

James Sanders

5/6/2008 1:42:00 AM

Ruby's regular expression engine appears to act incorrectly when given a
non-greedy match-range of the form {m,n}?

Take this example:

"Age: 21" =~ /Age.{0,60}: ([\w]+)/

This returns 0, as expected and $1 is set to "21"

However:

"Age: 21" =~ /Age.{0,60}?: ([\w]+)/

This returns nil and $1 is set to nil.

I believe the greedy and non-greedy cases should be equivalent in this
case, but are not.

I've included a tarball with two files, one written in perl and the
other in ruby, performing this match. The perl script acts as expected.

Apologies if this is a known bug that I have been unable to find on
RubyForge, or if this is expected behavior. If it is the former I would
appreciate anyone who could point me at the bug listing, and if it is
the latter, I would appreciate enlightenment on the reason for this
behavior. In any other case, it would be much appreciated for anyone to
verify that this behavior is a bug, and I will file it.

Thanks

Attachments:
http://www.ruby-...attachment/1...

--
Posted via http://www.ruby-....

4 Answers

Phrogz

5/6/2008 2:55:00 AM

0

On May 5, 7:42 pm, James Sanders <james.sand...@colorado.edu> wrote:
> Ruby's regular expression engine appears to act incorrectly when given a
> non-greedy match-range of the form {m,n}?
>
> Take this example:
>
> "Age: 21" =~ /Age.{0,60}: ([\w]+)/
>
> This returns 0, as expected and $1 is set to "21"
>
> However:
>
> "Age: 21" =~ /Age.{0,60}?: ([\w]+)/
>
> This returns nil and $1 is set to nil.

This seems like a bug, given:
s = "Age: 21"
s =~ /Age.*: (\w+)/ #=> 0
s =~ /Age.*?: (\w+)/ #=> 0
s =~ /Age.{0,60}: (\w+)/ #=> 0
s =~ /Age.{0,60}?: (\w+)/ #=> nil

(Perhaps you were pairing down a real-world testcase; did you know
that you can simply use \w+ instead of [\w]+ to match one-or-more-word-
characters? And that \d may be more appropriate, matching only digit
characters?)

My simple experiments make me believe this is an edge case
specifically when:
a) a non-greedy range
b) that is matching any-char
c) has a lower-limit of 0
d) and must match 0 times to succeed.

Here's my test data, with analysis following.

s = "abbc"
%w|
ab{1,9}c ab{1,9}?c
abb{1,9}c abb{1,9}?c
abbb{1,9}c abbb{1,9}?c
ab{0,9}c ab{0,9}?c
abb{0,9}c abb{0,9}?c
abbb{0,9}c abbb{0,9}?c
a.{1,9}c a.{1,9}?c
ab.{1,9}c ab.{1,9}?c
abb.{1,9}c abb.{1,9}?c
a.{0,9}c a.{0,9}?c
ab.{0,9}c ab.{0,9}?c
abb.{0,9}c abb.{0,9}?c
|.each_with_index{ |pattern,i|
regex = Regexp.new( pattern )
puts "%2i %-15s %s" % [
i, regex.inspect, (s =~ regex).inspect
]
}

#=> 0 /ab{1,9}c/ 0
#=> 1 /ab{1,9}?c/ 0
#=> 2 /abb{1,9}c/ 0
#=> 3 /abb{1,9}?c/ 0
#=> 4 /abbb{1,9}c/ nil
#=> 5 /abbb{1,9}?c/ nil
#=> 6 /ab{0,9}c/ 0
#=> 7 /ab{0,9}?c/ 0
#=> 8 /abb{0,9}c/ 0
#=> 9 /abb{0,9}?c/ 0
#=> 10 /abbb{0,9}c/ 0
#=> 11 /abbb{0,9}?c/ 0
#=> 12 /a.{1,9}c/ 0
#=> 13 /a.{1,9}?c/ 0
#=> 14 /ab.{1,9}c/ 0
#=> 15 /ab.{1,9}?c/ 0
#=> 16 /abb.{1,9}c/ nil
#=> 17 /abb.{1,9}?c/ nil
#=> 18 /a.{0,9}c/ 0
#=> 19 /a.{0,9}?c/ 0
#=> 20 /ab.{0,9}c/ 0
#=> 21 /ab.{0,9}?c/ 0
#=> 22 /abb.{0,9}c/ 0
#=> 23 /abb.{0,9}?c/ nil

In the above, we would expect patterns 4, 5, 16 and 17 to fail, but
not 23.

Notable is that pattern #15 succeeds (showing that a non-greedy range
matching any-char can match a lower-limit number of times) and that
pattern #11 succeeds (showing that a non-greedy range matching a
specific char can match zero number of times).

Phrogz

5/6/2008 3:00:00 AM

0

On May 5, 7:42 pm, James Sanders <james.sand...@colorado.edu> wrote:
> Ruby's regular expression engine appears to act incorrectly when given a
> non-greedy match-range of the form {m,n}?

I forgot to note, in my previous reply, that my test results are
against 1.8.6:
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-darwin9.1.0]

Ruby v1.9 (using a different regexp engine, "Oniguruma") does not
suffer from the same problem.

Chris Shea

5/6/2008 3:17:00 AM

0

On May 5, 8:59 pm, Phrogz <phr...@mac.com> wrote:
> On May 5, 7:42 pm, James Sanders <james.sand...@colorado.edu> wrote:
>
> > Ruby's regular expression engine appears to act incorrectly when given a
> > non-greedy match-range of the form {m,n}?
>
> I forgot to note, in my previous reply, that my test results are
> against 1.8.6:
> ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-darwin9.1.0]
>
> Ruby v1.9 (using a different regexp engine, "Oniguruma") does not
> suffer from the same problem.

Rubinius and JRuby don't seem to suffer from it either.

Chris

James Sanders

5/6/2008 4:34:00 AM

0

Thank you Gavin and Chris for your verification. Gavin, you are right
that it is pared down from a real problem where a character class and
alphanumerics were necessary, thank you for your much better examples.
I'll file a bug report against 1.8.6.

-James

--
Posted via http://www.ruby-....