[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

confused by 'test'.gsub(/.*/,'x'

Wybo Dekker

4/2/2008 8:12:00 PM

Why do I get "xx" instead of "x" in the following:

$ irb
>> 'test'.gsub(/.*/,'x')
=> "xx"

and even more confusing (to me):

>> "x\n".gsub(/.*/,'y')
=> "yy\ny"

(I expected "y\n")
--
Wybo

13 Answers

Thomas Wieczorek

4/2/2008 8:36:00 PM

0

On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <wybo@servalys.nl> wrote:
> Why do I get "xx" instead of "x" in the following:
>
> $ irb
> >> 'test'.gsub(/.*/,'x')
> => "xx"
>

* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

> and even more confusing (to me):
>
> >> "x\n".gsub(/.*/,'y')
> => "yy\ny"
>

Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"

Yossef Mendelssohn

4/2/2008 8:55:00 PM

0

On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com>
wrote:
> On Wed, Apr 2, 2008 at 10:12 PM, Wybo Dekker <w...@servalys.nl> wrote:
> > Why do I get "xx" instead of "x" in the following:
>
> > $ irb
> > >> 'test'.gsub(/.*/,'x')
> > => "xx"
>
> .* matches NO and ALL characters, so gsub() substitutes
> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.

> > and even more confusing (to me):
>
> > >> "x\n".gsub(/.*/,'y')
> > => "yy\ny"

This makes sense because . doesn't normally match \n, so there's the
replacement before and after. Still, the double replacement when there
are actual characters is just weird.

> Same goes here as above. If you want to replace each character use
> 'test'.gsub(/./,'x') #=> 'xxxx'
> or if you want to replace all characters in each line, use
> "test\ntest".gsub(/.+/,'x') #=> "x\nx"

--
-yossef

Thomas Wieczorek

4/2/2008 9:00:00 PM

0

On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn <ymendel@pobox.com> wrote:
> On Apr 2, 3:35 pm, "Thomas Wieczorek" <wieczo...@googlemail.com>
> wrote:
>
> >
> > .* matches NO and ALL characters, so gsub() substitutes
> > ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'
>
> That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
> than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
> should match [empty string]test[empty string] just once.
>

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'

Jens Wille

4/2/2008 9:14:00 PM

0

Thomas Wieczorek [2008-04-02 22:59]:
> On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn
> <ymendel@pobox.com> wrote:
>> On Apr 2, 3:35 pm, "Thomas Wieczorek"
>> <wieczo...@googlemail.com> wrote:
>>> .* matches NO and ALL characters, so gsub() substitutes
>>> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
>>> 'xx'
>> That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
>> more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
>> that the .* should match [empty string]test[empty string] just once.
> Yeah, it is confusing me, but I agreed on that explanation with
> myself, when I read it once here. I'd also expect 'x' instead of 'xx'
can't explain it either, i'm afraid. but you can see what it does
like so:

irb> 'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

irb> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

;-)

cheers
jens

Brian Adkins

4/2/2008 9:25:00 PM

0

On Apr 2, 5:13 pm, Jens Wille <jens.wi...@uni-koeln.de> wrote:
> Thomas Wieczorek [2008-04-02 22:59]:> On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn
> > <ymen...@pobox.com> wrote:
> >> On Apr 2, 3:35 pm, "Thomas Wieczorek"
> >> <wieczo...@googlemail.com> wrote:
> >>> .* matches NO and ALL characters, so gsub() substitutes
> >>> ''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
> >>> 'xx'
> >> That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
> >> more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
> >> that the .* should match [empty string]test[empty string] just once.
> > Yeah, it is confusing me, but I agreed on that explanation with
> > myself, when I read it once here. I'd also expect 'x' instead of 'xx'
>
> can't explain it either, i'm afraid. but you can see what it does
> like so:
>
> irb> 'test'.gsub(/.*/) { |m| p m; 'x'}
> "test"
> ""
> =>"xx"

That seems like a bug to me. The entire string is matched/consumed
by .*, so why try matching again? Or, if you are going to continue,
why stop with just one additional match? Is there code in gsub to
"only match one time after the string is consumed" ?

irb(main):001:0> 'test' =~ /(.*)(.*)(.*)/
=> 0
irb(main):002:0> $1
=> "test"
irb(main):003:0> $2
=> ""
irb(main):004:0> $3
=> ""

Wybo Dekker

4/2/2008 9:39:00 PM

0

Jens Wille wrote:

> as soon as you anchor the regexp at the beginning of the string it
> gives the expected result:
>
> irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
> "test"
> =>"x"
>
> or just do:
>
> irb> 'test'.sub(/.*/) { |m| p m; 'x'}
> "test"
> =>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.


--
Wybo

Januski, Ken

4/2/2008 10:08:00 PM

0

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=3D> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=3D> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=3D> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=3D> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=3D> "tex"

Ken


-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]=20
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens Wille wrote:

> as soon as you anchor the regexp at the beginning of the string it=20
> gives the expected result:
>=20
> irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
> "test"
> =3D>"x"
>=20
> or just do:
>=20
> irb> 'test'.sub(/.*/) { |m| p m; 'x'}
> "test"
> =3D>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.


--
Wybo


Jens Wille

4/2/2008 11:04:00 PM

0

Januski, Ken [2008-04-03 00:08]:
> Of course my background is Perl and I believe that's how it would
> work there.
no, works the same way there:

sh> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx

(only a lot more complicated ;-)

btw: python, php and javascript, too.

oh, and here's what oniguruma does:

irb> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"

cheers
jens

Bilyk, Alex

4/3/2008 12:17:00 AM

0

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, ...
-
Wybo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would venture to say this is exactly what it does. It finds two
matches and replaces them both with 'x'. The first match is an empty
string <zero>, while the second match is the full string <or more >.

Alex


-----Original Message-----
From: Januski, Ken [mailto:kjanuski@phillynews.com]=20
Sent: Wednesday, April 02, 2008 3:08 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=3D> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=3D> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=3D> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=3D> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=3D> "tex"

Ken


-----Original Message-----
From: Wybo Dekker [mailto:wybo@servalys.nl]=20
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens Wille wrote:

> as soon as you anchor the regexp at the beginning of the string it=20
> gives the expected result:
>=20
> irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
> "test"
> =3D>"x"
>=20
> or just do:
>=20
> irb> 'test'.sub(/.*/) { |m| p m; 'x'}
> "test"
> =3D>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.


--
Wybo



Zoltan Dezso

4/3/2008 1:13:00 AM

0

Perl, PHP:

perl -le '$str="test"; $str =~ s/.*?/x/g; print $str;'
xxxxxxxxx

preg_replace('/.*?/', 'x', 'test');
xxxxxxxxx

Ruby:
print 'test'.gsub(/.*?/, 'x')
xtxexsxtx

Zaki
--
Posted via http://www.ruby-....