[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

regexp problem /[éê]/ || /é|ê/

Jonatas Paganini

3/6/2009 12:32:00 PM

Hi, I got a problem try to replace accentuated characters like:

>irb
irb(main):001:0>
irb(main):002:0* name = "Fênix"
=> "F\303\252nix"
irb(main):003:0> name.gsub(/[éê]/,'e')
=> "Feenix"
irb(main):004:0> name.gsub(/é|ê/,'e')
=> "Fenix"

What's the difference between /[éê]/ and /é|ê/ ?

ps: ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]
--
Posted via http://www.ruby-....

3 Answers

lasitha

3/6/2009 2:13:00 PM

0

On Fri, Mar 6, 2009 at 6:02 PM, Jonatas Paganini <jonatasdp@gmail.com> wrot=
e:
> Hi, I got a problem try to replace accentuated characters like:
>
> irb(main):002:0* name =3D "F=EAnix"
> =3D> "F\303\252nix"
> irb(main):003:0> name.gsub(/[=E9=EA]/,'e')
> =3D> "Feenix"
> irb(main):004:0> name.gsub(/=E9|=EA/,'e')
> =3D> "Fenix"

Looks to me like an encoding problem. What source encoding are you working=
in?

If you set $KCODE =3D 'UTF-8' or append /u to the regex literals does it
resolve the inconsistency?


> What's the difference between /[=E9=EA]/ and /=E9|=EA/ ?

In that context there shouldn't be any difference. The union, |, can
be used for patterns longer than a single character, but the specific
patterns above look equivalent to me. But if the encoding isn't set
appropriately all bets are off!

> ps: ruby -v
> ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]

ps: the unicode support has apparently been much improved in 1.9.

Cheers,
lasitha

-lim-

3/6/2009 3:35:00 PM

0

> > What's the difference between /[=E9=EA]/ and /=E9|=EA/ ?
>
> In that context there shouldn't be any difference

If the source is in utf-8, then ruby 1.8 interpretes [=E9=EA] as a choice
of 4 bytes: [195, 169, 195, 170]

F=EAnix is seen as:
[70, 195, 170, 110, 105, 120]

195 & 170 get replaced with "e", hence Feenix.

Jonatas Paganini

3/6/2009 5:28:00 PM

0


>
> If you set $KCODE = 'UTF-8' or append /u to the regex literals does it
> resolve the inconsistency?

WORKS! setting $KCODE or using /u

interesting!!!

Thanks VERY MUCH!

--
Posted via http://www.ruby-....