Berger, Daniel
7/27/2006 6:17:00 PM
Caleb Clausen wrote:
> On 7/27/06, Roman Hausner <roman.hausner@gmail.com> wrote:
>> I am disappointed to learn that Ruby obviously implements yet another
>> regular expression library that does not avoid very slow matching
>> failures.
>>
>> Just try this program:
>>
>> str1 = "foo " * 70 + "foo ";
>> str2 = "foo " * 70 + "fo ";
>>
>> strings = str1, str2;
>>
>> strings.each { |s|
>> print "Matching #{s}\n";
>> if(s =~ /^(\s*foo\s*)*$/)
>> print "YES!\n";
>> else
>> print "NO!\n";
>> end
>> }
>
>
> If you take out the first \s* (or move it outside the parens), the
> slowness goes away. That first \s* isn't needed. Another trick that
> may help here is changing the *'s to +'s.
>
> This is a case of pathological backtracking. Usually, these things are
> solved by carefully tuning the Regexp, rather than expecting the
> Regexp engine to fix it...
>
> Yes, perl optimizes this case, but the fact is that you're asking for
> a lot of silly extra backtracking, and that is what ruby gives you. I
> imagine that you didn't know that you'd created a backtracking
> monster...
>
> Basically, as long as everything is matching, you're fine and matches
> will be nice and zippy. But once that 'fo' causes a mismatch, the
> engine has to go back and check every possible way that the spaces in
> the string could be matched by either of the 2 whitespace matchers.
> Since there are ~70 spaces in your string and each could be matched by
> either \s*, there are about 2**70 possibilites to explore, total. You
> can see how that might take a little time.
>
> I'm afraid I'm not explaining very well why this Regexp is a
> monster... maybe someone else will take a stab.
>
It would take too long and I'm tired. Jeffrey Friedl's "Mastering Regular
Expressions" is the definitive source on the subject. That being said, I think
the OP needs a lesson on greedy vs non-greedy regular expressions.
Also note that Perl must have optimized for this sort of backtracking after
5.005_03, because it hangs as well. I'd be curious as to their reasons for
doing this, but I have my suspicions.
Regards,
Dan
This communication is the property of Qwest and may contain confidential or
privileged information. Unauthorized use of this communication is strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and destroy
all copies of the communication and any attachments.