[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Why is this regex invalid?

Daniel Finnie

12/6/2006 10:38:00 PM

irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
SyntaxError: compile error
(irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
from (irb):2

irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
=> /([0-9]*)([^\+ -]+)(.*)/

The only thing that it different in the 2 regexps is the placement of
the space in the square brackets, yet the first regexp is invalid and
the 2nd valid.

Why is this?

Thanks,
Dan

7 Answers

MenTaLguY

12/6/2006 10:44:00 PM

0

On Thu, 7 Dec 2006 07:38:10 +0900, Daniel Finnie <danfinnie@optonline.net> wrote:
> irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
> SyntaxError: compile error
> (irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
> from (irb):2
>
> irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
> => /([0-9]*)([^\+ -]+)(.*)/
>
> The only thing that it different in the 2 regexps is the placement of
> the space in the square brackets, yet the first regexp is invalid and
> the 2nd valid.
>
> Why is this?

When an unescaped - appears in any position but the first or final one inside brackets, it is interpreted as a range separator rather than a literal '-'. Apparently '\+- ' isn't a valid range.

-mental


Alex LeDonne

12/6/2006 10:44:00 PM

0

On 12/6/06, Daniel Finnie <danfinnie@optonline.net> wrote:
> irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
> SyntaxError: compile error
> (irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
> from (irb):2
>
> irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
> => /([0-9]*)([^\+ -]+)(.*)/
>
> The only thing that it different in the 2 regexps is the placement of
> the space in the square brackets, yet the first regexp is invalid and
> the 2nd valid.
>
> Why is this?
>
> Thanks,
> Dan
>
It's the placement of the - that makes a difference. In a character
class, - between two characters denotes a range. So the first
character class includes a range, from + to <space>, which is invalid
because + comes after space in the relevant character encoding.

If you want a literal hyphen in a character class, it's safest to make
it the last character.

-A

Christopher Schneider

12/6/2006 10:44:00 PM

0

I'm pretty sure the dash needs to be escaped in regular expressions.
The second one works since it is the last character in the character
class, and hence isn't defining a range.

/([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.


-Chris Schneider

On Dec 6, 2006, at 3:38 PM, Daniel Finnie wrote:

> irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
> SyntaxError: compile error
> (irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
> from (irb):2
>
> irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
> => /([0-9]*)([^\+ -]+)(.*)/
>
> The only thing that it different in the 2 regexps is the placement
> of the space in the square brackets, yet the first regexp is
> invalid and the 2nd valid.
>
> Why is this?
>
> Thanks,
> Dan


Wilson Bilkovich

12/6/2006 10:44:00 PM

0

On 12/6/06, Daniel Finnie <danfinnie@optonline.net> wrote:
> irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
> SyntaxError: compile error
> (irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
> from (irb):2
>
> irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
> => /([0-9]*)([^\+ -]+)(.*)/
>
> The only thing that it different in the 2 regexps is the placement of
> the space in the square brackets, yet the first regexp is invalid and
> the 2nd valid.
>

The hyphen in the middle expression is ambiguous, because it could
either be a range or a literal.
One way is to rearrange the order so that it comes first:
/([0-9]*)([^-+ ]+)(.*)/

Junnichi Ohno

12/6/2006 10:46:00 PM

0

Daniel Finnie wrote:
> irb(main):002:0> regex = /([0-9]*)([^\+- ]+)(.*)/
> SyntaxError: compile error
> (irb):2: invalid regular expression: /([0-9]*)([^\+- ]+)(.*)/
> from (irb):2
>
> irb(main):003:0> regex = /([0-9]*)([^\+ -]+)(.*)/
> => /([0-9]*)([^\+ -]+)(.*)/
>
> The only thing that it different in the 2 regexps is the placement of
> the space in the square brackets, yet the first regexp is invalid and
> the 2nd valid.
>
> Why is this?
>
> Thanks,
> Dan
>
>

Hi,

'-' shuld be escaped like this.
regex = /([0-9]*)([^\+\- ]+)(.*)/

Jun

David Kastrup

12/7/2006 12:25:00 AM

0

Christopher Schneider <cschneid@colostate.edu> writes:

> I'm pretty sure the dash needs to be escaped in regular expressions.
> The second one works since it is the last character in the character
> class, and hence isn't defining a range.
>
> /([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.

Rubbish. In character ranges, \ is not special.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum

Ken Bloom

12/7/2006 4:58:00 AM

0

On Thu, 07 Dec 2006 01:24:30 +0100, David Kastrup wrote:

> Christopher Schneider <cschneid@colostate.edu> writes:
>
>> I'm pretty sure the dash needs to be escaped in regular expressions.
>> The second one works since it is the last character in the character
>> class, and hence isn't defining a range.
>>
>> /([0-9]*)([^\+\- ]+)(.*)/ should work - note the escaped dash.
>
> Rubbish. In character ranges, \ is not special.

It most certainly is special:
irb(main):014:0> /[a\+\- ]/=~".\\"
=> nil

--
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu...