Asp Forum - Linear complete variation of MatchData#to_a, possible?

Trans

4/20/2005 4:03:00 PM

Hi--

I have a it bit of a puzzle for Regexp engine hackers out there. The
#to_a method on MatchData gives a simple list of matching portions of
the matched text.

md = /(1)(2(3))(4)/.match "012345"
md.to_a
["1234", "1", "23", "3", "4"]

But I would like a method the produces a linearly segmented list of the
text itself, like this:

["0", "1", "2", [ "3" ], "4", "5" ]

Notice the array depth corresponds to the subexpression depth.

My first stab was:

def matchlist
# good idea to cache?
@matchset ||= pre_match + self[1..-1] + post_match
end

But obviously that does not work when there are subexpressions.

Is such a method even possible?

Thanks,
T.

8 Answers

Pit Capitain

4/20/2005 4:55:00 PM

Trans schrieb:
>
> I have a it bit of a puzzle for Regexp engine hackers out there. The
> #to_a method on MatchData gives a simple list of matching portions of
> the matched text.
>
> md = /(1)(2(3))(4)/.match "012345"
> md.to_a
> ["1234", "1", "23", "3", "4"]
>
> But I would like a method the produces a linearly segmented list of the
> text itself, like this:
>
> ["0", "1", "2", [ "3" ], "4", "5" ]
>
> Notice the array depth corresponds to the subexpression depth.

Do you really think that you want this nesting? I'd expect something like

[ "0", [ [ "1" ], [ "2", [ "3" ] ], [ "4" ] ], "5" ]

Could you try to specify the rules for your desired nesting and
splitting? I think it should be possible to implement something like this.

Regards,
Pit

Trans

4/20/2005 5:13:00 PM

Hi Capitain--

Pit Capitain wrote:
> Do you really think that you want this nesting? I'd expect something
like
>
> [ "0", [ [ "1" ], [ "2", [ "3" ] ], [ "4" ] ], "5" ]
>
> Could you try to specify the rules for your desired nesting and
> splitting? I think it should be possible to implement something like
this.

Your nesting will work too. I excluded a couple layers were I felt it
wasn't significant. I think the pre and post matches can just be the
first and last elements respectively, so another that nesting isn't
relly needed (IMHO). But you may be quite right about the next layer. I
had thought it might be dropped without issue, but perhaps it is in
fact neccessary to keep the nesting consistant with the subexpressions.
Nonetheless, yes, you have the idea. Can it be done? I'm not sure how
one would "query" the level of subexpression nesting.

Thanks,
T.

dblack

4/20/2005 5:24:00 PM

Pit Capitain

4/20/2005 5:29:00 PM

Trans schrieb:
>
> Your nesting will work too. I excluded a couple layers were I felt it
> wasn't significant. I think the pre and post matches can just be the
> first and last elements respectively, so another that nesting isn't
> relly needed (IMHO). But you may be quite right about the next layer. I
> had thought it might be dropped without issue, but perhaps it is in
> fact neccessary to keep the nesting consistant with the subexpressions.
> Nonetheless, yes, you have the idea. Can it be done? I'm not sure how
> one would "query" the level of subexpression nesting.

Hi Tom,

I'd try using MatchData#begin and MatchData#end:

md = /(1)(2(3))(4)/.match( "012345" )

md.size.times do |i|
puts( md.begin( i ) .. md.end( i ) )
end

The output is:

1..5
1..2
2..4
3..4
4..5

It should be possible to determine the splitting and the nesting out of
those numbers. If you're not successful, send me a mail and I'll try to
find some more time to help.

Regards,
Pit

Trans

4/20/2005 8:56:00 PM

Amazing Pit Capitain! Your hunch was right on the money. Not only was
your suggestion key to the solution, but the output was just as you
expected. Here's the solution I found. It's not all that elegant, but
it appears to work okay. I am certain there are much better solutions
to be had, so if anyone has one to offer...

class MatchData

def matchset
b = Hash.new(0)
e = Hash.new(0)

self.captures.size.times do |i|
b[self.begin(i)] += 1
e[self.end(i)] += 1
end

a = self.string.split(//)
c = ""
ca = []
stack = []

(a.size).times { |i|
# end
if e[i] and e[i] != 0
ca << c
e[i].times { ca = stack.pop }
c = nil
end
# begin
if b[i] and b[i] != 0
ca << c if c
b[i].times {
stack << ca
ca << []
ca = ca.last
}
c = nil
end
# content
c = "" unless c
c << a[i] #if a[i]
}
ca << c

return ca
end

end #class Matchdata

md = /(bb)(cc(dd))(ee)/.match "XXaabbccddeeffXX"
p md.to_a
p md.matchset
#=> ["XXaa", [["bb"], ["cc", ["dd"]], "ee"], "ffXX"]

Dominik Bathon

4/22/2005 1:51:00 AM

On Wed, 20 Apr 2005 22:59:33 +0200, Trans <transfire@gmail.com> wrote:

> Amazing Pit Capitain! Your hunch was right on the money. Not only was
> your suggestion key to the solution, but the output was just as you
> expected. Here's the solution I found. It's not all that elegant, but
> it appears to work okay. I am certain there are much better solutions
> to be had, so if anyone has one to offer...

Here is my recursive version:

class MatchData

def matchtree(index=0)
ret=[]
b, e=self.begin(index), self.end(index)
while (index+=1)<=length
if index==length || (bi=self.begin(index))>=e
# we are finished, if something is left, then add it
ret << string[b, e-b] if e>b
break
else
if bi>=b
ret << string[b, bi-b] if bi>b
ret << matchtree(index)
b=self.end(index)
end
end
end
ret
end

def matchset
[pre_match, matchtree, post_match]
end

end

md = /(bb)(cc(dd))(ee)/.match "XXaabbccddeeffXX"
p md.to_a
p md.matchset
md.length.times { |i| p md.matchtree(i) }

Output:
["bbccddee", "bb", "ccdd", "dd", "ee"]
["XXaa", [["bb"], ["cc", ["dd"]], ["ee"]], "ffXX"]
[["bb"], ["cc", ["dd"]], ["ee"]]
["bb"]
["cc", ["dd"]]
["dd"]
["ee"]

Btw. your version seems to have a little problem:
> #=> ["XXaa", [["bb"], ["cc", ["dd"]], "ee"], "ffXX"]
There are no brackets around the "ee".

Dominik

Trans

4/22/2005 3:17:00 PM

Dominik,

Much obliged and nicely done! A recrusive algortihm actually makes a
lot a sense.

> Btw. your version seems to have a little problem:

Eek. I missed that --more like a big problem. But no matter, you saved
the day! If it's okay by you, I will include in Ruby Facets (credit
given to you, of course).

Thanks,
T.

Dominik Bathon

4/23/2005 5:01:00 PM

On Fri, 22 Apr 2005 17:19:32 +0200, Trans <transfire@gmail.com> wrote:

> Dominik,
>
> Much obliged and nicely done! A recrusive algortihm actually makes a
> lot a sense.
>
>> Btw. your version seems to have a little problem:
>
> Eek. I missed that --more like a big problem. But no matter, you saved
> the day! If it's okay by you, I will include in Ruby Facets (credit
> given to you, of course).

Sure, use it for whatever you want :-)

comp.lang.ruby

Linear complete variation of MatchData#to_a, possible?

Trans

Pit Capitain

Trans

dblack

Pit Capitain

Trans

Dominik Bathon

Trans

Dominik Bathon

x Login to ForumsZone