Asp Forum - too greedy of a regexp

dave rose

11/9/2006 4:36:00 PM

i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....

..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)

so what is the non-greedy way for the above regexp to properly match
each invoice...

--
Posted via http://www.ruby-....

3 Answers

Jano Svitok

11/9/2006 5:21:00 PM

On 11/9/06, Dave Rose <bitdoger2@yahoo.com> wrote:
> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
> processing
> a billing extract file containing:
> BillHead...<<<much information here>>\n
> <<one or more detail lines here\n>>
> Bill_End...<<<much information here>>\n
> BillHead...<<<much information here>>\n
> <<one or more detail lines here\n>>
> Bill_End...<<<much information here>>\n
> ...etc.... to EOF....
>
> ..i get the whole file matched....i just want each invoice...
> it will eventually be in a oneliner like
> a=File.read("billfile").scan(regexp)
>
> so what is the non-greedy way for the above regexp to properly match
> each invoice...

try:

/(^BillHead(.*?))(^Bill_End(.*?))\n/m

or

/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

notice the .*? instead of .*

*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)

Robert Klemme

11/9/2006 6:19:00 PM

Jan Svitok wrote:
> On 11/9/06, Dave Rose <bitdoger2@yahoo.com> wrote:
>> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
>> processing
>> a billing extract file containing:
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> ...etc.... to EOF....
>>
>> ..i get the whole file matched....i just want each invoice...
>> it will eventually be in a oneliner like
>> a=File.read("billfile").scan(regexp)
>>
>> so what is the non-greedy way for the above regexp to properly match
>> each invoice...
>
> try:
>
> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
>
> or
>
> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
>
> notice the .*? instead of .*
>
> *? has some pecularities, that were discussed here some time ago, so
> perhaps you'd want to find them in the archives. (search for 'greedy'
> or 'regex' - I don't remeber now)

I would also remove the last .* because that likely eats up the rest of
the document. So that would be

/^BillHead(.*?)^BillEnd/m

Another approach is to do

s.split(/^(Bill(?:Head|End))/m)

and then go through the array.

irb(main):006:0> "BillHead\nfoo\nbar\nBillEnd".split(/^(Bill(?:Head|End))/m)
=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]

Kind regards

robert

dave rose

11/9/2006 7:05:00 PM

Robert Klemme wrote:
> Jan Svitok wrote:
>>> ...etc.... to EOF....
>> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
>>
>> or
>>
>> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
>>
>> notice the .*? instead of .*
>>
=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]
>
> Kind regards
>
> robert
i played around in irb with a shorten extract file and found that:
b=File.read("drbilp.txt").scan(/(^BillHead(.*?))(^Bill_End(\d*)(\s*UBPBILP1\n)(.*?))/m)
works in that separates each invoice in an sub-array of size=6
in which b[x][0]+b[x][2] completes that task of reading,scanning
correctly
and puting all in a ruby 'container' that i can do an each on....thanx
dave

--
Posted via http://www.ruby-....

comp.lang.ruby

too greedy of a regexp

dave rose

Jano Svitok

Robert Klemme

dave rose

x Login to ForumsZone