Asp Forum - Regexp/scan question

Peter Szinek

12/11/2006 9:28:00 AM

Hello,

I need to match a chunk of code like this:

....
....
#begin here
...}
......end
...}
......}
.....end
...
...

I need to match from "the #begin here" up to the n-th closing token
(i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
meaningful, i.e. there are no more '}' + 'end's than n.

Example
match_stuff(2):

#begin here
...}
......end

match_stuff(4):

#begin here
...}
......end
...}
......}

etc.

What's the most optimal way to accomplish this? I have been trying with
scan() but I did not really succeed yet

TIA,
Peter

__
http://www.rubyra...

8 Answers

Carlos

12/11/2006 9:38:00 AM

Peter Szinek wrote:

> Hello,
>
> I need to match a chunk of code like this:
>
> ....
> ....
> #begin here
> ...}
> ......end
> ...}
> ......}
> .....end
> ...
> ...
>
> I need to match from "the #begin here" up to the n-th closing token
> (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
> meaningful, i.e. there are no more '}' + 'end's than n.

n = 4
text =~ /#begin(.*(\}|end)){#{n}}/m

?

(not tested).

Carlos

12/11/2006 9:43:00 AM

Robert Klemme

12/11/2006 9:48:00 AM

On 11.12.2006 10:37, Carlos wrote:
> Peter Szinek wrote:
>
>> Hello,
>>
>> I need to match a chunk of code like this:
>>
>> ....
>> ....
>> #begin here
>> ...}
>> ......end
>> ...}
>> ......}
>> .....end
>> ...
>> ...
>>
>> I need to match from "the #begin here" up to the n-th closing token
>> (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
>> meaningful, i.e. there are no more '}' + 'end's than n.
>
> n = 4
> text =~ /#begin(.*(\}|end)){#{n}}/m
>
> ?
>
> (not tested).

IMHO this does not work because of the greedy ".*". You could try with
reluctant, i.e. ".*?". Also the grouping does not catch the whole sequence.

robert

Peter Szinek

12/11/2006 9:49:00 AM

Carlos wrote:
> Peter Szinek wrote:
>
>> Hello,
>>
>> I need to match a chunk of code like this:
>>
>> ....
>> ....
>> #begin here
>> ...}
>> ......end
>> ...}
>> ......}
>> .....end
>> ...
>> ...
>>
>> I need to match from "the #begin here" up to the n-th closing token
>> (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
>> meaningful, i.e. there are no more '}' + 'end's than n.
>
> n = 4
> text =~ /#begin(.*(\}|end)){#{n}}/m

Sorry, I need to 'scan' it. I have been playing around with similar
regexps, but they did not work out. E.g. also yours:

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

does not work with scan...

Cheers,
Peter

__
http://www.rubyra...

Peter Szinek

12/11/2006 9:55:00 AM

> IMHO this does not work because of the greedy ".*". You could try with
> reluctant, i.e. ".*?". Also the grouping does not catch the whole
> sequence.

Yeah, I tried to correct these problems but I am still not quite there...

Carlos' regexp, vol 2 (with greedy ?)

irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
=> ".... #begin aaaa end bbb } ccc end ddd"
irb(main):008:0> n = 2
=> 2
irb(main):009:0> text.scan(/#begin(.*?(\}|end)){#{n}}/m)
=> [[" ccc end", "end"]]

And I would like to get

[["#begin aaaa end bbb }"]]

OK, I know that I did not specify the problem correctly for the first
time, maybe now it is more clear...

Cheers,
Peter

__
http://www.rubyra...

Carlos

12/11/2006 10:03:00 AM

Peter Szinek wrote:

> Carlos wrote:
>
>> Peter Szinek wrote:
>>
>>> Hello,
>>>
>>> I need to match a chunk of code like this:
>>>
>>> ....
>>> ....
>>> #begin here
>>> ...}
>>> ......end
>>> ...}
>>> ......}
>>> .....end
>>> ...
>>> ...
>>>
>>> I need to match from "the #begin here" up to the n-th closing token
>>> (i.e. '}' or 'end'). n can be arbitrary (let's consider that it is
>>> meaningful, i.e. there are no more '}' + 'end's than n.
>>
>>
>> n = 4
>> text =~ /#begin(.*(\}|end)){#{n}}/m
>
>
> Sorry, I need to 'scan' it. I have been playing around with similar
> regexps, but they did not work out. E.g. also yours:
>
> irb(main):007:0> text = '.... #begin aaaa end bbb } ccc end ddd'
> => ".... #begin aaaa end bbb } ccc end ddd"
> irb(main):008:0> n = 2
> => 2
> irb(main):009:0> text.scan(/#begin(.*(\}|end)){#{n}}/m)
> => [[" ccc end", "end"]]
>
> does not work with scan...

To make it work with scan just make the parens non-capturing:

irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee
end fff"
=> "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
=> ["#begin aaa end bbb }", "#begin ddd end eee end"]

Good luck.
--

Peter Szinek

12/11/2006 10:13:00 AM

> To make it work with scan just make the parens non-capturing:
>
> irb(main):001:0> text = "#begin aaa end bbb } ccc } #begin ddd end eee
> end fff"
> => "#begin aaa end bbb } ccc } #begin ddd end eee end fff"
> irb(main):002:0> text.scan(/#begin(?:.*?(?:\}|end)){2}/m)
> => ["#begin aaa end bbb }", "#begin ddd end eee end"]

Ha! That was the trick I have been looking for! Muchas Gracias, Carlos.

Cheers,
Peter

__
http://www.rubyra...

Paul Lutus

12/11/2006 10:23:00 AM

Peter Szinek wrote:

> Hello,
>
> I need to match a chunk of code like this:
>
> ...
> ...
> #begin here
> ..}
> .....end
> ..}
> .....}
> ....end

This won't solve the entire problem, but it will give you an array of
indices to matching elements:

---------------------------------

#!/usr/bin/ruby -w

data = File.read("testdata.txt")

match_indices = []

data.scan(/\}/) do
match_indices << Regexp.last_match.begin(0)
end

puts match_indices

---------------------------------

You could begin by scanning to your planned start mark, then scan for
matching elements using this code. Or you could segregate the block between
the start and end marks, then scan for matches in the substring using this
code.

--
Paul Lutus
http://www.ara...

comp.lang.ruby

Regexp/scan question

Peter Szinek

Carlos

Carlos

Robert Klemme

Peter Szinek

Peter Szinek

Carlos

Peter Szinek

Paul Lutus

x Login to ForumsZone