[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regular expression matches last occurrence instead of first

andyo

2/27/2007 12:22:00 PM

I've found an anomoly in the way Ruby handles non-greedy regular
expressions and wonder whether it's been discussed before. A search of
the documentation and a general Internet search didn't turn up
information on this issue.

When I want to match the first quoted string in a string such as:

"aaaaa""bbb""ccc"

I match the last quoted string instead. The exact characters don't
matter.

Here's the sample code; note that (.*?) and ([^"]+) behave the same
way--and not the way I'd expect:

str = '"aaaaa""bbb""ccc"'

str.scan(/"(.*?)"/)
puts $1
# ccc

Andy Oram
str.scan(/"([^"]+)"/)
puts $1
# ccc

str.scan(/"(.*?)"(.*)/)
puts $1
# aaaaa

Adding an extra (.*) to the end produces the result I want, but I
don't believe it should make any difference.

Here is the equivalent Perl, which works as expected:

$str = q{"aaaaa""bbb""ccc"};
$str =~ /"(.*?)"/;
print $1 , "\n";

$str =~ /"([^"]+)"/;
print $1 , "\n";
# aaaaa

$str =~ /"(.*?)"(.*)/;
print $1 , "\n";
# aaaaa

And the equivalent PHP:

<?php

$str = '"aaaaa""bbb""ccc"';
preg_match('/"(.*?)"/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa

preg_match('/"([^"]+)"/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa

preg_match('/"(.*?)"(.*)/', $str, $matches);
echo $matches[1] , "\n";
// aaaaa

?>

1 Answer

Vincent Fourmond

2/27/2007 12:32:00 PM

0

andyo wrote:
> Here's the sample code; note that (.*?) and ([^"]+) behave the same
> way--and not the way I'd expect:
>
> str = '"aaaaa""bbb""ccc"'
>
> str.scan(/"(.*?)"/)
> puts $1
> # ccc

Normal... #scan is not what you'r looking for:

------------------------------------------------------------ String#scan
str.scan(pattern) => array
str.scan(pattern) {|match, ...| block } => str
------------------------------------------------------------------------
Both forms iterate through str, matching the pattern (which may be
a Regexp or a String). For each match, a result is generated and
either added to the result array or passed to the block. [...]

scan find all successive matches for the pattern, and sets the
captured groups variables everytime it finds one. So, here, you simply
get the $1 for the last match, ie "ccc".

What you're looking for is simply =~, as in Perl:

irb(main):001:0> str = '"aaaaa""bbb""ccc"'
=> "\"aaaaa\"\"bbb\"\"ccc\""
irb(main):002:0> str =~ /"(.*?)"/
=> 0
irb(main):003:0> $1
=> "aaaaa"

Cheers,

Vincent
--
Vincent Fourmond, PhD student (not for long anymore)
http://vincent.fourmon...