[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regex parsing question

Paul Hanchett

3/31/2005 8:28:00 PM

Why does this:

text= "AA<X>BB<X>CC</X>DD</X>EE"
regex = %r{(.*)<X>(.*)}

t = text.sub( regex, "z" );
print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"

Return this:

$1=AA<X>BB
$2=CC</X>DD</X>EE
$3=
$4=

Instead of:

$1=AA
$2=BB<X>CC</X>DD</X>EE
$3=
$4=

And how would I fix it?

Paul
4 Answers

James Gray

3/31/2005 9:02:00 PM

0

On Mar 31, 2005, at 2:49 PM, Paul Hanchett wrote:

> Why does this:
>
> text= "AA<X>BB<X>CC</X>DD</X>EE"
> regex = %r{(.*)<X>(.*)}
>
> t = text.sub( regex, "z" );
> print "$1=#{$1}\n$2=#{$2}\n$3=#{$3}\n$4=#{$4}\n"
>
> Return this:
>
> $1=AA<X>BB
> $2=CC</X>DD</X>EE
> $3=
> $4=

Because the construct .* means, "Zero of more non-newline characters,
but as many as I can get". We say the * operator is "greedy".

> Instead of:
>
> $1=AA
> $2=BB<X>CC</X>DD</X>EE
> $3=
> $4=
>
> And how would I fix it?

One way would be to switch from the greedy * to the conservative *?.
That would have your Regexp looking like this:

%r{(.*?)<X>(.*)}

Another way is to use split() with a limit:

irb(main):001:0> text= "AA<X>BB<X>CC</X>DD</X>EE"
=> "AA<X>BB<X>CC</X>DD</X>EE"
irb(main):002:0> first, rest = text.split(/<X>/, 2)
=> ["AA", "BB<X>CC</X>DD</X>EE"]
irb(main):003:0> first
=> "AA"
irb(main):004:0> rest
=> "BB<X>CC</X>DD</X>EE"

Hope that helps.

James Edward Gray II



dblack

3/31/2005 9:07:00 PM

0

Nikolai Weibull

3/31/2005 9:11:00 PM

0

* Paul Hanchett (Mar 31, 2005 23:00):
> text= "AA<X>BB<X>CC</X>DD</X>EE"
> regex = %r{(.*)<X>(.*)}

use

regex = %r{(.*?)<X>(.*)}

The .* will match the first <X> and will only relinquish the second so
that an overall match can be made (for the <X>-part of the regex),
nikolai

--
::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden :::
::: page: minimalistic.org :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}


Paul Hanchett

3/31/2005 10:23:00 PM

0

Thanks all for the help. I understand better now.

Paul