James Gray
3/31/2005 9:02:00 PM
On Mar 31, 2005, at 2:49 PM, Paul Hanchett wrote:
> Why does this:
>
> text= "AA<X>BB<X>CC</X>DD</X>EE"
> regex = %r{(.*)<X>(.*)}
>
> t = text.sub( regex, "z" );
> print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"
>
> Return this:
>
> $1=AA<X>BB
> $2=CC</X>DD</X>EE
> $3=
> $4=
Because the construct .* means, "Zero of more non-newline characters,
but as many as I can get". We say the * operator is "greedy".
> Instead of:
>
> $1=AA
> $2=BB<X>CC</X>DD</X>EE
> $3=
> $4=
>
> And how would I fix it?
One way would be to switch from the greedy * to the conservative *?.
That would have your Regexp looking like this:
%r{(.*?)<X>(.*)}
Another way is to use split() with a limit:
irb(main):001:0> text= "AA<X>BB<X>CC</X>DD</X>EE"
=> "AA<X>BB<X>CC</X>DD</X>EE"
irb(main):002:0> first, rest = text.split(/<X>/, 2)
=> ["AA", "BB<X>CC</X>DD</X>EE"]
irb(main):003:0> first
=> "AA"
irb(main):004:0> rest
=> "BB<X>CC</X>DD</X>EE"
Hope that helps.
James Edward Gray II