Asp Forum - Re: Regular expression mismatch ?

Warren Brown

4/6/2005 2:58:00 PM

Han,

> Why does the following:
>
> s = "aaa aaa\n\n\nbbb bbb"
> puts(s =~ /^\s+$/)
>
> produce: 8 (instead of nil) ?

Because /^/ matches the beginning of a line (not the beginning of
the string), and /\s/ matches whitespace, which includes newlines (\n).
So the first place in the string where the beginning of a line is
followed by one or more whitespaces is at position 8.

> (If I put in only 2 newlines, it's fine).

With only two newlines, the /$/ prevents the match, since there are
"b"s following the newline.

I hope this helps.

- Warren Brown

12 Answers

Han Holl

4/7/2005 6:47:00 AM

On Apr 6, 2005 4:58 PM, Warren Brown <warrenb@timevision.com> wrote:
> Han,
[ cut ]
>
> Because /^/ matches the beginning of a line (not the beginning of
> the string), and /\s/ matches whitespace, which includes newlines (\n).
> So the first place in the string where the beginning of a line is
> followed by one or more whitespaces is at position 8.
>
Thanks for the reactions to all.

It's not that simple: ^ _also_ matches the beginning of the string.
Perl does _not_
produce a match, unless you suffix the regular expression with m.

Cheers,

Han

Brian Candler

4/7/2005 8:05:00 AM

On Thu, Apr 07, 2005 at 03:47:15PM +0900, Han Holl wrote:
> On Apr 6, 2005 4:58 PM, Warren Brown <warrenb@timevision.com> wrote:
> > Han,
> [ cut ]
> >
> > Because /^/ matches the beginning of a line (not the beginning of
> > the string), and /\s/ matches whitespace, which includes newlines (\n).
> > So the first place in the string where the beginning of a line is
> > followed by one or more whitespaces is at position 8.
> >
> Thanks for the reactions to all.
>
> It's not that simple: ^ _also_ matches the beginning of the string.
> Perl does _not_
> produce a match, unless you suffix the regular expression with m.

And so it's worth pointing out that in Ruby you should write:

str.untaint if str =~ /\A[a-z0-9]*\z/ # good

and not:

str.untaint if str =~ /^[a-z0-9]*$/ # HIGHLY DANGEROUS

It means that these sorts of regexp are a bit less readable than Perl's.

Regards,

Brian.

Han Holl

4/7/2005 9:30:00 AM

On Apr 7, 2005 10:05 AM, Brian Candler <B.Candler@pobox.com> wrote:
> And so it's worth pointing out that in Ruby you should write:
>
> str.untaint if str =~ /\A[a-z0-9]*\z/ # good
>
> and not:
>
> str.untaint if str =~ /^[a-z0-9]*$/ # HIGHLY DANGEROUS
>
> It means that these sorts of regexp are a bit less readable than Perl's.
>
> Regards,
>
> Brian.
>
>
Which leaves the question: what is the meaning if the m suffix in ruby ?
It would seem that multi-line is on by default, with no means to switch it off.

Ruby should not be different from the other RE engines with no good reason.

Cheers,

Han Holl

Neil Stevens

4/7/2005 9:39:00 AM

On Thu, 07 Apr 2005 19:29:42 +0900, Han Holl wrote:
> Ruby should not be different from the other RE engines with no good reason.

Well, too late now, since not breaking existing scripts is good reason to
keep the present behavior.

--
Neil Stevens - neil@hakubi.us

'A republic, if you can keep it.' -- Benjamin Franklin

dblack

4/7/2005 10:35:00 AM

Han Holl

4/7/2005 12:59:00 PM

On Apr 7, 2005 12:34 PM, David A. Black
> The /m suffix means that \n is included in . (dot).
>
Yes, looked it up in the Pickaxe, and indeed that's what it says.

This is from man perlre:
m Treat string as multiple lines. That is, change "^" and "$"
from matching the start or end of the string to matching then
start or end of any line anywhere within the string.

This should go on the page I've seen somewhere with gotchas.
Perl RE is quite widespread, and when ruby deviates from it it's
easy to trip up.

Cheers,
Han Holl

dblack

4/7/2005 1:09:00 PM

Han Holl

4/7/2005 1:38:00 PM

On Apr 7, 2005 3:09 PM, David A. Black <dblack@wobblini.net> wrote:

>
> Not if you use Ruby more and more :-)
>
This problem occurred while porting nasty old Perl program
to shiny new Ruby. I used to rely on ruby's re to be Perl
compatible.

Han

dblack

4/7/2005 1:46:00 PM

Neil Stevens

4/7/2005 3:33:00 PM

On Thu, 07 Apr 2005 23:38:17 +0900, Han Holl wrote:

> On Apr 7, 2005 3:09 PM, David A. Black <dblack@wobblini.net> wrote:
>
>>
>> Not if you use Ruby more and more :-)
>>
> This problem occurred while porting nasty old Perl program
> to shiny new Ruby. I used to rely on ruby's re to be Perl
> compatible.

And I'm sure people who have relied on Perl REs being compatible with its
predecessors have been bitten by problems, too.

Regular expressions never really have been regular enough to make that
assumption, though.

--
Neil Stevens - neil@hakubi.us

'A republic, if you can keep it.' -- Benjamin Franklin

comp.lang.ruby

Re: Regular expression mismatch ?

Warren Brown

Han Holl

Brian Candler

Han Holl

Neil Stevens

dblack

Han Holl

dblack

Han Holl

dblack

Neil Stevens

x Login to ForumsZone