Robert Klemme
11/9/2006 6:19:00 PM
Jan Svitok wrote:
> On 11/9/06, Dave Rose <bitdoger2@yahoo.com> wrote:
>> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
>> processing
>> a billing extract file containing:
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> ...etc.... to EOF....
>>
>> ..i get the whole file matched....i just want each invoice...
>> it will eventually be in a oneliner like
>> a=File.read("billfile").scan(regexp)
>>
>> so what is the non-greedy way for the above regexp to properly match
>> each invoice...
>
> try:
>
> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
>
> or
>
> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
>
> notice the .*? instead of .*
>
> *? has some pecularities, that were discussed here some time ago, so
> perhaps you'd want to find them in the archives. (search for 'greedy'
> or 'regex' - I don't remeber now)
I would also remove the last .* because that likely eats up the rest of
the document. So that would be
/^BillHead(.*?)^BillEnd/m
Another approach is to do
s.split(/^(Bill(?:Head|End))/m)
and then go through the array.
irb(main):006:0> "BillHead\nfoo\nbar\nBillEnd".split(/^(Bill(?:Head|End))/m)
=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]
Kind regards
robert