Xavier Noria
3/10/2006 11:35:00 AM
On Mar 10, 2006, at 11:58, francisrammeloo@hotmail.com wrote:
> The pattern I used to find a class definition line is:
>
> line =~ /^\s*class\s+(\w+)/
>
> But I want to exclude forward class declarations ( class MyClass; )
>
> So I changed my pattern to:
>
> line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
> with ";"
>
> But it doesn't work... Why?
I don't know exactly in what sense it does not work, but negations in
regexps are tricky.
A regexp engine *always* tries to match. If in a first attempt \w+
matches the whole class name and then the rest does not match, then
the regexp engine backtracks and happens to find a "shorter class
name" whose remaining characters are not semicolons, so it still
matches.
class Foo; (\w+ -> "Foo", fails, backtrack)
^
class Foo; (\w+ -> "Fo", no whitespace, "o" is not a semicolon,
matched)
^
A solution is to add an anchor for end of string. Another one is to
prevent \w+ from backtracking, that is known as "atomic grouping":
(?>\w+) # grab word characters and do not backtrack
In addition, the idiomatic way to say "and at this point I don't what
this to happen" is to use a negative look-ahead assertion. All in all
we get this:
/^\s*class\s+(?>\w+)(?!\s*;)/
-- fxn