Asp Forum - packing things back to regular expression

wildwest

2/20/2008 6:37:00 PM

Hi

I wonder if python has a function to pack things back into regexp,
that has group names.

e.g:
exp = (<?P<name1>[a-z]+)
compiledexp = re.compile(exp)

Now, I have a dictionary "mytable = {"a" : "myname"}

Is there a way in re module, or elsewhere, where I can have it match
the contents from dictionary to the re-expression (and check that it
matches the rules) and than return the substituted string?

e.g
>> re.SomeNewFunc(compilexp, mytable)
"myname"
>> mytable = {"a" : "1"}
>> re.SomeNewFunc(compileexp, mytable)
ERROR

Thanks
A

7 Answers

Gary Herron

2/20/2008 6:50:00 PM

Amit Gupta wrote:
> Hi
>
> I wonder if python has a function to pack things back into regexp,
> that has group names.
>
> e.g:
> exp = (<?P<name1>[a-z]+)
> compiledexp = re.compile(exp)
>
> Now, I have a dictionary "mytable = {"a" : "myname"}
>
> Is there a way in re module, or elsewhere, where I can have it match
> the contents from dictionary to the re-expression (and check that it
> matches the rules) and than return the substituted string?
>
I'm not following what you're asking for until I get to the last two
words. The re module does have functions to do string substitution.
One or more occurrences of a pattern matched by an re can be replaces
with a given string. See sub and subn. Perhaps you can make one of
those do whatever it is you are trying to do.

Gary Herron

> e.g
>
>>> re.SomeNewFunc(compilexp, mytable)
>>>
> "myname"
>
>>> mytable = {"a" : "1"}
>>> re.SomeNewFunc(compileexp, mytable)
>>>
> ERROR
>
>
>
> Thanks
> A
>

Tim Chase

2/20/2008 7:27:00 PM

> mytable = {"a" : "myname"}
>>> re.SomeNewFunc(compilexp, mytable)
> "myname"

how does SomeNewFunc know to pull "a" as opposed to any other key?

>>> mytable = {"a" : "1"}
>>> re.SomeNewFunc(compileexp, mytable)
> ERROR

You could do something like one of the following 3 functions:

import re
ERROR = 'ERROR'
def some_new_func(table, regex):
"Return processed results for values matching regex"
result = {}
for k,v in table.iteritems():
m = regex.match(v)
if m:
result[k] = m.group(1)
else:
result[k] = ERROR
return result

def some_new_func2(table, regex, key):
"Get value (if matches regex) or ERROR based on key"
m = regex.match(table[key])
if m: return m.group(0)
return ERROR

def some_new_func3(table, regex):
"Sniff the desired key from the regexp (inefficient)"
for k,v in table.iteritems():
m = regex.match(v)
if m:
groupname, match = m.groupdict().iteritems().next()
if groupname == k:
return match
return ERROR

if __name__ == "__main__":
NAME = 'name1'
mytable = {
'a': 'myname',
'b': '1',
NAME: 'foo',
}
regexp = '(?P<%s>[a-z]+)' % NAME
print 'Using regex:'
print regexp
print '='*10

r = re.compile(regexp)
results = some_new_func(mytable, r)
print 'a: ', results['a']
print 'b: ', results['b']
print '='*10
print 'a: ', some_new_func2(mytable, r, 'a')
print 'b: ', some_new_func2(mytable, r, 'b')
print '='*10
print '%s: %s' % (NAME, some_new_func3(mytable, r))

Function#2 is the optimal solution, for single hits, whereas
Function#1 is best if you plan to repeatedly extract keys from
one set of processed results (the function only gets called
once). Function#3 is just ugly, and generally indicates that you
need to change your tactic ;)

-tkc

wildwest

2/20/2008 7:36:00 PM

Before I read the message: I screwed up.

Let me write again

>> x = re.compile("CL(?P<name1>[a-z]+)")
# group name "name1" is attached to the match of lowercase string of
alphabet
# Now I have a dictionary saying {"name1", "iamgood"}
# I would like a function, that takes x and my dictionary and return
"CLiamgood"
# If my dictionary instead have {"name1", "123"}, it gives error on
processingit
#
# In general, I have reg-expression where every non-trivial match has
a group-name. I want to do the reverse of reg-exp match. The function
can take reg-exp and replace the group-matches from dictionary
# I hope, this make it clear.

Steven D'Aprano

2/21/2008 12:30:00 AM

On Wed, 20 Feb 2008 11:36:20 -0800, Amit Gupta wrote:

> Before I read the message: I screwed up.
>
> Let me write again
>
>>> x = re.compile("CL(?P<name1>[a-z]+)")
> # group name "name1" is attached to the match of lowercase string of
> alphabet
> # Now I have a dictionary saying {"name1", "iamgood"}
> # I would like a function, that takes x and my dictionary and
> return "CLiamgood"
> # If my dictionary instead have {"name1", "123"}, it gives error on
> processingit
> #
> # In general, I have reg-expression where every non-trivial match has a
> group-name. I want to do the reverse of reg-exp match. The function can
> take reg-exp and replace the group-matches from dictionary
> # I hope, this make it clear.

Clear as mud. But I'm going to take a guess.

Are you trying to validate the data against the regular expression as
well as substitute values? That means your function needs to do something
like this:

(1) Take the regular expression object, and extract the string it was
made from. That way at least you know the regular expression was valid.

x = re.compile("CL(?P<name1>[a-z]+)") # validate the regex
x.pattern()

=> "CL(?P<name1>[a-z]+)"

(2) Split the string into sets of three pieces:

split("CL(?P<name1>[a-z]+)") # you need to write this function

=> ("CL", "(?P<name1>", "[a-z]+)")

(3) Mangle the first two pieces:

mangle("CL", "(?P<name1>") # you need to write this function

=> "CL%(name1)s"

(4) Validate the value in the dictionary:

d = {"name1", "123"}
validate("[a-z]+)", d)

=> raise exception

d = {"name1", "iamgood"}
validate("[a-z]+)", d)

=> return True

(5) If the validation step succeeded, then do the replacement:

"CL%(name1)s" % d

=> "CLiamgood"

Step (2), the splitter, will be the hardest because you essentially need
to parse the regular expression. You will need to decide how to handle
regexes with multiple "bits", including *nested* expressions, e.g.:

"CL(?P<name1>[a-z]+)XY(?:AB)[aeiou]+(?P<name2>CD(?P<name3>..)\?EF)"

Good luck.

--
Steven

MRAB

2/21/2008 8:13:00 PM

On Feb 20, 7:36 pm, Amit Gupta <emaila...@gmail.com> wrote:
> Before I read the message: I screwed up.
>
> Let me write again
>
> >> x = re.compile("CL(?P<name1>[a-z]+)")
>
> # group name "name1" is attached to the match of lowercase string of
> alphabet
> # Now I have a dictionary saying {"name1", "iamgood"}
> # I would like a function, that takes x and my dictionary and return
> "CLiamgood"
> # If my dictionary instead have {"name1", "123"}, it gives error on
> processingit
> #
> # In general, I have reg-expression where every non-trivial match has
> a group-name. I want to do the reverse of reg-exp match. The function
> can take reg-exp and replace the group-matches from dictionary
> # I hope, this make it clear.

If you want the string that matched the regex then you can use
group(0) (or just group()):

>>> x = re.compile("CL(?P<name1>[a-z]+)")
>>> m = x.search("something CLiamgood!something else")
>>> m.group()
'CLiamgood'

Paul McGuire

2/21/2008 9:30:00 PM

On Feb 20, 6:29 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Wed, 20 Feb 2008 11:36:20 -0800, Amit Gupta wrote:
> > Before I read the message: I screwed up.
>
> > Let me write again
>
> >>> x = re.compile("CL(?P<name1>[a-z]+)")
> > # group name "name1" is attached to the match of lowercase string of
> > alphabet
> > # Now I have a dictionary saying {"name1", "iamgood"}
> > # I would like a function, that takes x and my dictionary and
> > return "CLiamgood"
> > # If my dictionary instead have {"name1", "123"}, it gives error on
> > processingit
> > #
> > # In general, I have reg-expression where every non-trivial match has a
> > group-name. I want to do the reverse of reg-exp match. The function can
> > take reg-exp and replace the group-matches from dictionary
> > # I hope, this make it clear.
>
<snip>
>
> Good luck.
>
> --
> Steven

Oh, pshaw! Try this pyparsing ditty.

-- Paul
http://pyparsing.wiki...

from pyparsing import *
import re

# replace patterns of (?P<name>xxx) with dict
# values iff value matches 'xxx' as re

LPAR,RPAR,LT,GT = map(Suppress,"()<>")
nameFlag = Suppress("?P")
rechars = printables.replace(")","").replace("(","")+" "
regex = Forward()("fld_re")
namedField = (nameFlag + LT + Word(alphas,alphanums+"_")("fld_name") + GT + regex )
regex << Combine(OneOrMore(Word(rechars) |
r"\(" | r"\)" |
nestedExpr(LPAR, RPAR, namedField |
regex,
ignoreExpr=None ) ))

def fillRE(reString, nameDict):
def fieldPA(tokens):
fieldRE = tokens.fld_re
fieldName = tokens.fld_name
if fieldName not in nameDict:
raise ParseFatalException(
"name '%s' not defined in name dict" %
(fieldName,) )
fieldTranslation = nameDict[fieldName]
if (re.match(fieldRE, fieldTranslation)):
return fieldTranslation
else:
raise ParseFatalException(
"value '%s' does not match re '%s'" %
(fieldTranslation, fieldRE) )
namedField.setParseAction(fieldPA)
try:
return (LPAR + namedField + RPAR).transformString(reString)
except ParseBaseException, pe:
return pe.msg

# tests start here
testRE = r"CL(?P<name1>[a-z]+)"

# a simple test
test1 = { "name1" : "iamgood" }
print fillRE(testRE, test1)

# harder test, including nested names (have to be careful in
# constructing the names dict)
testRE = r"CL(?P<name1>[a-z]+)XY(?P<name4>(:?AB)[aeiou]+)" r"(?P<name2>CD(?P<name3>..)\?EF)"
test3 = { "name1" : "iamgoodZ",
"name2" : "CD@@?EF",
"name3" : "@@",
"name4" : "ABeieio",
}
print fillRE(testRE, test3)

# test a non-conforming field
test2 = { "name1" : "123" }
print fillRE(testRE, test2)

Prints:

CLiamgood
CLiamgoodZXYABeieioCD@@?EF
value '123' does not match re '[a-z]+'

wildwest

2/24/2008 11:44:00 PM

> "CL(?P<name1>[a-z]+)XY(?:AB)[aeiou]+(?P<name2>CD(?P<name3>..)\?EF)"
>
> Good luck.
>
> --
> Steven

This is what I did in the end (in principle). Thanks.

A

comp.lang.python

packing things back to regular expression

wildwest

Gary Herron

Tim Chase

wildwest

Steven D'Aprano

MRAB

Paul McGuire

wildwest

x Login to ForumsZone