[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

microsoft.public.dotnet.framework

Regex groups and captures

Erik Funkenbusch

4/11/2008 8:40:00 PM

Can anyone explain the difference between Match.Captures and Match.Groups
in regex's?

Note: I understand how Regex works, I just don't understand what the
difference is between these two collections.

..Captures always seems to contain only one entry when I match something, no
matter if there are multiple groups.
8 Answers

Tim Smith

4/12/2008 10:01:00 PM

0

In article <1n7jfxpt12gmu$.dlg@funkenbusch.com>,
Erik Funkenbusch <erik@despam-funkenbusch.com> wrote:
> Can anyone explain the difference between Match.Captures and Match.Groups
> in regex's?
>
> Note: I understand how Regex works, I just don't understand what the
> difference is between these two collections.

[Note the cross-posting. Some idiot troll has quoted your post in COLA,
so I'm responding from there]

Here's an example from "Mastering Regular Expressions, 3rd Edition" by
Friedl. Consider applying the regular expression ^(..)+ to the
string 'abcdefghijk':

Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")

Then M.Groups(1) is 'ij'. But on the way to capturing 'ij', the (..)
was applied to four other pairs of letters. These are available in
Captures:

M.Groups(1).Captures(0).Value is 'ab'
M.Groups(1).Captures(1).Value is 'cd'
...
M.Groups(1).Captures(4).Value is 'ij'

This is where Captures might be useful--as a collection available from a
Groups object.

> .Captures always seems to contain only one entry when I match something, no
> matter if there are multiple groups.

A Match object also has a Captures property, which is the same as the
Captures property of 0th group. The 0th group represents the entire
match, so there is nothing iterated through like there was in the
earlier example, and the 0th Captures collection always has only Capture.

(I recommend a Safari Library or Safari Bookshelf subscription if you
don't already have one. That makes books such as "Mastering Regular
Expressions" available to you online, which comes in really handy. And
it is not just O'Reilly books. They have a lot from other publishers,
too).


--
--Tim Smith

Erik Funkenbusch

4/12/2008 10:39:00 PM

0

On Sat, 12 Apr 2008 15:00:37 -0700, Tim Smith wrote:

> Here's an example from "Mastering Regular Expressions, 3rd Edition" by
> Friedl. Consider applying the regular expression ^(..)+ to the
> string 'abcdefghijk':
>
> Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")
>
> Then M.Groups(1) is 'ij'. But on the way to capturing 'ij', the (..)
> was applied to four other pairs of letters. These are available in
> Captures:

Ahh.. Of course, I should have realized that. My regexes are typically far
more specifically written and don't match multiple iterations (but can
often have multiple groups).

Thanks.

Chris Ahlstrom

4/13/2008 1:26:00 AM

0

* Erik Funkenbusch peremptorily fired off this memo:

> On Sat, 12 Apr 2008 15:00:37 -0700, Tim Smith wrote:
>
>> Here's an example from "Mastering Regular Expressions, 3rd Edition" by
>> Friedl. Consider applying the regular expression ^(..)+ to the
>> string 'abcdefghijk':
>>
>> Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")
>>
>> Then M.Groups(1) is 'ij'. But on the way to capturing 'ij', the (..)
>> was applied to four other pairs of letters. These are available in
>> Captures:
>
> Ahh.. Of course, I should have realized that. My regexes are typically far
> more specifically written and don't match multiple iterations (but can
> often have multiple groups).

The recent book "Beautiful Code" (O'Reilly Press, I believe) has, as its
first chapter, a simplified regexp utility using C code, recursion, and
pointers. Beautiful stuff.

Incidentally, Kernighan also mentions that Ken Thompson obtained the
first "software patent" for some code he wrote, a long time ago.

I'd get up and find the book, but I'm too lazy.

--
We are not even close to finishing the basic dream of what the PC can be.
-- Bill Gates

Matt

4/13/2008 2:57:00 PM

0

Linonut wrote:
> * Erik Funkenbusch peremptorily fired off this memo:
>
>> On Sat, 12 Apr 2008 15:00:37 -0700, Tim Smith wrote:
>>
>>> Here's an example from "Mastering Regular Expressions, 3rd Edition" by
>>> Friedl. Consider applying the regular expression ^(..)+ to the
>>> string 'abcdefghijk':
>>>
>>> Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")
>>>
>>> Then M.Groups(1) is 'ij'. But on the way to capturing 'ij', the (..)
>>> was applied to four other pairs of letters. These are available in
>>> Captures:
>> Ahh.. Of course, I should have realized that. My regexes are typically far
>> more specifically written and don't match multiple iterations (but can
>> often have multiple groups).
>
> The recent book "Beautiful Code" (O'Reilly Press, I believe) has, as its
> first chapter, a simplified regexp utility using C code, recursion, and
> pointers. Beautiful stuff.
>
> Incidentally, Kernighan also mentions that Ken Thompson obtained the
> first "software patent" for some code he wrote, a long time ago.


http://www.softwarehistory.org/preservation/martin...
(((((
> Abstract: Martin A. Goetz, a software industry pioneer, was a founder and past president of Applied Data Research (ADR). He was awarded the first software patent in 1968 for his sorting system program and was a longtime spokesperson for protecting software as intellectual property.
)))))

Maybe you're thinking of this:

http://en.wikipedia.org/w...
(((((
> The setuid bit was invented by Dennis Ritchie. His employer, AT&T, applied for a patent in 1972; the patent was granted in 1979 as patent number 4,135,240 "Protection of data file contents". The patent was later placed in the public domain.
)))))

Speaking of Ken Thompson and regular expressions:

Ken Thompson: Regular Expression Search Algorithm. Commun. ACM 11(6):
419-422 (1968)
> A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular expression as source language and produces an IBM 7094 program as object language. The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression. Examples, problems, and solutions are also presented.


> I'd get up and find the book, but I'm too lazy.


Go for it.

Doug Mentohl

4/13/2008 3:36:00 PM

0

On 12 Apr, 23:00, Tim Smith wrote:

> Here's an example from "Mastering Regular Expressions, 3rd Edition" by

I'm confused are 'Match.Captures' and 'Match.Groups' unique to .NET
functions or are they generic to Regex? What is the difference between
'dotNET Regex' and 'generic Regex'?

"Match.Captures and Match.Groups are to .NET properties and are not a
function of the Regex itself", Eric F.

Chris Ahlstrom

4/13/2008 4:23:00 PM

0

* Matt peremptorily fired off this memo:

>> Incidentally, Kernighan also mentions that Ken Thompson obtained the
>> first "software patent" for some code he wrote, a long time ago.
>
> http://www.softwarehistory.org/preservation/martin...
> (((((
>> Abstract: Martin A. Goetz, a software industry pioneer, was a founder
>> and past president of Applied Data Research (ADR). He was awarded the
>> first software patent in 1968 for his sorting system program and was
>> a longtime spokesperson for protecting software as intellectual
>> property.
>)))))

Ahhhh, an idiot that pre-dates Bill Gates.

> Speaking of Ken Thompson and regular expressions:
>
> Ken Thompson: Regular Expression Search Algorithm. Commun. ACM 11(6):
> 419-422 (1968)
>> A method for locating specific character strings embedded in
>> character text is described and an implementation of this method
>> in the form of a compiler is discussed. The compiler accepts a
>> regular expression as source language and produces an IBM 7094
>> program as object language. The object program then accepts the
>> text to be searched as input and produces a signal every time an
>> embedded string in the text matches the given regular expression.
>> Examples, problems, and solutions are also presented.

That's the one in the book. Was granted in 1971.

>> I'd get up and find the book, but I'm too lazy.
>
> Go for it.

Nah, now I'm in the "computer room", and the book is in reach, so I
don't have to lift my Stentorian posterior to reach it.

--
Like almost everyone who uses e-mail, I receive a ton of spam every day. Much
of it offers to help me get out of debt or get rich quick. It would be funny if
it weren't so irritating.
-- Bill Gates, "Why I Hate Spam" in Microsoft PressPass (2003)

Erik Funkenbusch

4/13/2008 8:11:00 PM

0

On Sun, 13 Apr 2008 08:36:04 -0700 (PDT), Doug Mentohl wrote:

> On 12 Apr, 23:00, Tim Smith wrote:
>
>> Here's an example from "Mastering Regular Expressions, 3rd Edition" by
>
> I'm confused are 'Match.Captures' and 'Match.Groups' unique to .NET
> functions or are they generic to Regex? What is the difference between
> 'dotNET Regex' and 'generic Regex'?

Of course you're confused, because as usualy, you have no idea of the
things that you criticize others for.

Regex the language is pretty similar to standard perl regex version 7.
There aren't any 2 regex implementations that are identical, so don't go
crowing about "similar" being incompatible.

The difference, in what I was posting about, was how .net presents the
matches to the program, which is something that is unique to each
programming language (and often times each implementation). how a program
accesses regex data is different, whether it's perl, python, php, or .net.

But, if you knew anything at all about programming, you'd know this and
wouldn't continue to make yourself look like the biggest wanker on earth.

I'm really curious now about how long it will take you to realize just how
stupid you make yourself look by continuing to follow after me and post
things you *think* are somehow catching me in something, when in fact they
simply show how much of a moron you are.

Matt

4/14/2008 1:14:00 AM

0

Linonut wrote:

>> Ken Thompson: Regular Expression Search Algorithm. Commun. ACM 11(6):
>> 419-422 (1968)

> That's the one in the book. Was granted in 1971.


Thanks.


>>> I'd get up and find the book, but I'm too lazy.
>> Go for it.
>
> Nah, now I'm in the "computer room", and the book is in reach, so I
> don't have to lift my Stentorian posterior to reach it.


It may help to cut down on legumes and citrus fruits.