[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

regular expression negate a word (not character

SpringFlowers AutumnMoon

1/26/2008 1:17:00 AM


somebody who is a regular expression guru... how do you negate a word
and grep for all words that is

tire

but not

snow tire

or

snowtire

so for example, it will grep for

winter tire
tire
retire
tired

but will not grep for

snow tire
snow tire
some snowtires

need to do it in one regular expression

23 Answers

SpringFlowers AutumnMoon

1/26/2008 2:16:00 AM

0

On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire

i could think of something like

/[^s][^n][^o][^w]\s*tire/i

but what if it is not snow but some 20 character-word, then do we need
to do it 20 times to negate it? any shorter way?

Joseph Pecoraro

1/26/2008 2:36:00 AM

0

SpringFlowers AutumnMoon wrote:
> On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
>>
>> snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

I took a long look at this and I came up with a number of different
methods, including an idea like the one you have above. If you have a
set number of bad/undesirable words then everything falls apart. I
tried negative look behinds but those don't work well with 0 or more
spaces because look-behinds have to have a fixed length. I really don't
think that this could be done elegantly with a single regular expression
if you have multiple bad/undesirable words. However, if you split this
into two regular expressions then it becomes rather straightforward.

I really have spent the last 20 minutes trying out different
possibilities with a single regular expressions but it just doesn't seem
worth the difficulty =(

May I ask why there is the requirement for a single regular expression?

- Joe P
--
Posted via http://www.ruby-....

Judson Lester

1/26/2008 2:43:00 AM

0

On Jan 25, 2008 6:19 PM, Summercool <Summercoolness@gmail.com> wrote:
> On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

(?!snow)(\S{4})\s*(tire)|^\S{0,3}\s*(tire)

I'm not thrilled with that, but without look-behind, it's rough to do
what you're asking.

Shameless pluggery: I used RegexpBench to do the experimentation to
find your answer.

Judson
--
Your subnet is currently 169.254.0.0/16. You are likely to be eaten by a grue.

SpringFlowers AutumnMoon

1/26/2008 3:28:00 AM

0

On Jan 25, 6:35 pm, Joseph Pecoraro <joepec...@gmail.com> wrote:
>
> I really have spent the last 20 minutes trying out different
> possibilities with a single regular expressions but it just doesn't seem
> worth the difficulty =(
>
> May I ask why there is the requirement for a single regular expression?
>
> - Joe P

thanks for your post. a reason is that some text editor lets users
search all files using a regular expression... another reason is
that... if 2 lines are used to test... then what if that line actually
has tire and snowtire... then it may negate the whole line as a
result, even though we want to grep it due to the first word "tire".


Ben Morrow

1/26/2008 3:38:00 AM

0

[newsgroups line fixed, f'ups set to clpm]

Quoth Summercool <Summercoolness@gmail.com>:
> On Jan 25, 5:16 pm, Summercool <Summercooln...@gmail.com> wrote:
> > somebody who is a regular expression guru... how do you negate a word
> > and grep for all words that is
> >
> > tire
> >
> > but not
> >
> > snow tire
> >
> > or
> >
> > snowtire
>
> i could think of something like
>
> /[^s][^n][^o][^w]\s*tire/i
>
> but what if it is not snow but some 20 character-word, then do we need
> to do it 20 times to negate it? any shorter way?

This is no good, since 'snoo tire' fails to match even though you want
it to. You need something more like

/ (?: [^s]... | [^n].. | [^o]. | [^w] | ^ ) \s* tire /ix

but that gets *really* tedious for long strings, unless you generate it.

Ben

Mark Tolonen

1/26/2008 4:40:00 AM

0


"Summercool" <Summercoolness@gmail.com> wrote in message
news:27249159-9ff3-4887-acb7-99cf0d2582a8@n20g2000hsh.googlegroups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
> tire
>
> but not
>
> snow tire
>
> or
>
> snowtire
>
> so for example, it will grep for
>
> winter tire
> tire
> retire
> tired
>
> but will not grep for
>
> snow tire
> snow tire
> some snowtires
>
> need to do it in one regular expression
>

What you want is a negative lookbehind assertion:

>>> re.search(r'(?<!snow)tire','snowtire') # no match
>>> re.search(r'(?<!snow)tire','baldtire')
<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(?<!snow\s*)tire','snow tire')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\dev\python\lib\re.py", line 134, in search
return _compile(pattern, flags).search(string)
File "C:\dev\python\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>

Python doesn't support lookbehind assertions that can vary in size. This
doesn't work either:

>>> re.search(r'(?<!snow)\s*tire','snow tire')
<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
for matchobj in
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
if not re.match(notpattern,matchobj.group()):
yield matchobj

def markexcept(pattern,notpattern,string):
substrings = []
current = 0

for matchobj in finditerexcept(pattern,notpattern,string):
substrings.append(string[current:matchobj.start()])
substrings.append('[' + matchobj.group() + ']')
current = matchobj.end() #

substrings.append(string[current:])
return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire
.... tire
.... retire
.... tired
.... snow tire
.... snow tire
.... some snowtires
.... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow tire
some snowtires

--Mark

Joseph Pecoraro

1/26/2008 4:45:00 AM

0

SpringFlowers AutumnMoon wrote:
>> May I ask why there is the requirement for a single regular expression?
>
> thanks for your post. a reason is that some text editor lets users
> search all files using a regular expression... another reason is
> that... if 2 lines are used to test... then what if that line actually
> has tire and snowtire... then it may negate the whole line as a
> result, even though we want to grep it due to the first word "tire".

This is rather interesting to me. I recently (Dec-Jan) wrote a little
find/replace Ruby script that can deal with multiple files. I call the
utility rr.

What you're suggesting is a pretty cool idea and opens a number of
possible improvements that I did not think about. I can extend rr to
take multiple regular expressions, and allow the user to say yes match
this regex and No do not match this regular expression. I could also
simply add an option to print out only the files where the Regular
Expressions has a match, not performing the find/replace.

I will have to think this through, especially this Sunday when I have
more time.

I am sorry that this doesn't help you with your search for a single
regular expressions solution but I want to repeat that this seems so
much easier using two regular expressions that I think developing such a
utility would be worthwhile. I am really looking forward to
implementing these new ideas. For that I thank you!

I'm a rather intermediate Ruby programmer but if anyone would like to
check out rr they can at my blog. Here is a link to the most recent
article:
http://bogojoker.com/weblog/2008/01/01/rr-11-in-place-edits-and-multi...
--
Posted via http://www.ruby-....

Joseph Pecoraro

1/26/2008 4:59:00 AM

0

Mark Tolonen, those were the exact Ruby negative look-behinds that I
used. Its good to see that we had the same idea!
--
Posted via http://www.ruby-....

Joseph Pecoraro

1/26/2008 5:59:00 AM

0

I just wrote up a quick script to do what I was thinking. I decided to
make a different utility only because of the complications that would
arise with tons of switches on the command line if I were to add it to a
find/replace utility. (The user would have to say which regex they
wanted for the actual replacement, and other inherent problems... moving
on)

So without further ado, here is my example
------------------------------
joe[~/code/script]$ cat > input
winter tire
tire
retire
tired
snow tire
snow tire
some snowtires

joe[~/code/script]$ grepall -2 tire --neg snow input
input [1]: winter tire
input [2]: tire
input [3]: retire
input [4]: tired

joe[~/code/script]$ grepall
usage: grepall [-#] ( [-n] regex ) [filenames]
# - the number of regular expressions, defaults to 1
regex - regular expessions to be checked on the line
filenames - names of the input files to be parsed, if blank uses STDIN

options:
--neg or -n do not match this regular expression

special note:
When using bash, if you want backslashes in the replace portion make
sure
to use the multiple argument usage with single quotes for the
replacement.
------------------------------------------------------

The utility is hopefully easily to understand, although the usage is
tough to present:
- line by line processing
- in the above example the -2 says there will be two regular
expressions
- the first is /tire/ and that needs to match
- the second is /snow/ and that is Negated because of the --neg (or
just -n) option
- the last argument is the filename

The output needs to be tweaked, maybe so its more like grep. Right now
it allows for multiple files so it prints the filename, [line number],
and the line where there was a full match for all the regular
expressions as correctly matched (negated where necessary). Obviously
this is very simple at the moment and it doesn't cover the specific
situation you mentioned where there was the word tire and snowtire on
the same line.

However if that is an issue you can:
- find and replace all words SNOW with SPECIAL_STRING in all files
- do what you have to do...
- turn all SPECIAL_STRINGs back into SNOW in all files

That can be done rather easily. You will have lost the case sensitivity
in the word SNOW, but you can get around that by making your
SPECIAL_STRING something like XsXnXoXwX based on the original case
values of snow. I hope that made sense.

Well I better get to bed, you made my night interesting!
--
Posted via http://www.ruby-....

SpringFlowers AutumnMoon

1/26/2008 9:53:00 AM

0

to add to the test cases, the regular expression must be able to grep


snowbird tire
tired on a snow day
snow tire and regular tire