lasitha
3/6/2009 2:13:00 PM
On Fri, Mar 6, 2009 at 6:02 PM, Jonatas Paganini <jonatasdp@gmail.com> wrot=
e:
> Hi, I got a problem try to replace accentuated characters like:
>
> irb(main):002:0* name =3D "F=EAnix"
> =3D> "F\303\252nix"
> irb(main):003:0> name.gsub(/[=E9=EA]/,'e')
> =3D> "Feenix"
> irb(main):004:0> name.gsub(/=E9|=EA/,'e')
> =3D> "Fenix"
Looks to me like an encoding problem. What source encoding are you working=
in?
If you set $KCODE =3D 'UTF-8' or append /u to the regex literals does it
resolve the inconsistency?
> What's the difference between /[=E9=EA]/ and /=E9|=EA/ ?
In that context there shouldn't be any difference. The union, |, can
be used for patterns longer than a single character, but the specific
patterns above look equivalent to me. But if the encoding isn't set
appropriately all bets are off!
> ps: ruby -v
> ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux]
ps: the unicode support has apparently been much improved in 1.9.
Cheers,
lasitha