Mark Tolonen
1/26/2008 4:40:00 AM
"Summercool" <Summercoolness@gmail.com> wrote in message
news:27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>
What you want is a negative lookbehind assertion:
>>> re.search(r'(?<!snow)tire','snowtire') # no match
>>> re.search(r'(?<!snow)tire','baldtire')
<_sre.SRE_Match object at 0x00FCD608>
Unfortunately you want variable whitespace:
>>> re.search(r'(?<!snow\s*)tire','snow tire')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>
Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:
>>> re.search(r'(?<!snow)\s*tire','snow tire')
<_sre.SRE_Match object at 0x00F93480>
Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:
### BEGIN CODE ###
import re
def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj
def markexcept(pattern,notpattern,string):
substrings = []
current = 0
for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #
substrings.append(string[current:])
return ''.join(substrings)
### END CODE ###
>>> sample='''winter tire
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires
--Mark