Asp Forum - accents and String#tr

Xavier Noria

2/20/2006 11:28:00 PM

I wrote this method

def self.normalize_for_sorting(s)
return nil unless s
norm = s.downcase
norm.tr!('ÁÉÍÓÚ', 'aeiou')
norm.tr!('ÀÈÌÒÙ', 'aeiou')
norm.tr!('ÄËÏÖÜ', 'aeiou')
norm.tr!('ÂÊÎÔÛ', 'aeiou')
norm.tr!('áéíóú', 'aeiou')
norm.tr!('àèìòù', 'aeiou')
norm.tr!('äëïöü', 'aeiou')
norm.tr!('âêîôû', 'aeiou')
norm
end

to normalize strings for sorting. This script is UTF-8, everything is
UTF-8 in my application, $KCODE is 'u'.

But it does not work, examples:

Andrés -> andruos
López -> luupez
Pérez -> puorez

I tried to "force" it with Iconv.conv('UTF-8', 'ASCII', 'aeiou') to
no avail. Any ideas?

-- fxn

4 Answers

Robin Stocker

2/21/2006 1:54:00 PM

Xavier Noria wrote:
> I wrote this method
>
> def self.normalize_for_sorting(s)
> return nil unless s
> norm = s.downcase
> norm.tr!('ÁÉÍÓÚ', 'aeiou')
> norm.tr!('ÀÈÌÒÙ', 'aeiou')
> norm.tr!('ÄËÏÖÜ', 'aeiou')
> norm.tr!('ÂÊÎÔÛ', 'aeiou')
> norm.tr!('áéíóú', 'aeiou')
> norm.tr!('àèìòù', 'aeiou')
> norm.tr!('äëïöü', 'aeiou')
> norm.tr!('âêîôû', 'aeiou')
> norm
> end
>
> to normalize strings for sorting. This script is UTF-8, everything is
> UTF-8 in my application, $KCODE is 'u'.
>
> But it does not work, examples:
>
> Andrés -> andruos
> López -> luupez
> Pérez -> puorez
>
> I tried to "force" it with Iconv.conv('UTF-8', 'ASCII', 'aeiou') to no
> avail. Any ideas?
>
> -- fxn

Hi,

My guess is that the "tr" method treats its arguments as a string of
bytes. And because characters with accents need more than 1 byte in
UTF-8, #tr doesn't do what you would expect it to. (It's not even tr's
fault, how is it supposed to know that two bytes actually represent a
single character?)

The solution is not to use #tr!, but #gsub!. It isn't as short, but at
least it's right ;)

norm.gsub!('ä', 'a')
norm.gsub!('ë', 'e')
# and so on...

And because that is against DRY (Don't Repeat Yourself), I would
recommend storing the mapping as a hash:

accents = { 'ä' => 'a', 'ë' => 'e', ... }
accents.each do |accent, replacement|
norm.gsub!(accent, replacement)
end

Regards,
Robin Stocker

AGW Facts

8/29/2011 2:36:00 AM

On Sun, 28 Aug 2011 11:12:50 -0700, Captain Compassion
<daranc@NOSPAMcharter.net> wrote:

> On Sat, 27 Aug 2011 09:42:09 -0600, AGWFacts <AGWFacts@ipcc.org>
> wrote:
>
> >On Wed, 24 Aug 2011 12:37:20 -0700, Captain Compassion
> ><daranc@NOSPAMcharter.net> wrote:
> >
> >> The Americas, Not the Middle East, Will Be the World Capital of Energy
> >
> >Actually no, China will be the "world capital" on energy.
> >
> >> http://www.foreignpolicy.com/articles/2011/08/15/the_americas_not_the_middle_east_will_be_the_world_capital_of_ener...

> The Americas have more coal and shale.

Yes, and getting at it would be disasterous to the country.

Caravan

8/29/2011 2:44:00 AM

On 8/28/2011 10:35 PM, AGWFacts wrote:
> On Sun, 28 Aug 2011 11:12:50 -0700, Captain Compassion
> <daranc@NOSPAMcharter.net> wrote:
>
>> On Sat, 27 Aug 2011 09:42:09 -0600, AGWFacts<AGWFacts@ipcc.org>
>> wrote:
>>
>>> On Wed, 24 Aug 2011 12:37:20 -0700, Captain Compassion
>>> <daranc@NOSPAMcharter.net> wrote:
>>>
>>>> The Americas, Not the Middle East, Will Be the World Capital of Energy
>>>
>>> Actually no, China will be the "world capital" on energy.
>>>
>>>> http://www.foreignpolicy.com/articles/2011/08/15/the_americas_not_the_middle_east_will_be_the_world_capital_of_ener...
>
>> The Americas have more coal and shale.
>
> Yes, and getting at it would be disasterous to the country.
>

bwahahahahahaha!!!!!!!!!!!!!!!!!!!!!
good one.

Captain Compassion

8/29/2011 3:21:00 PM

On Sun, 28 Aug 2011 20:35:35 -0600, AGWFacts <AGWFacts@ipcc.org>
wrote:

>On Sun, 28 Aug 2011 11:12:50 -0700, Captain Compassion
><daranc@NOSPAMcharter.net> wrote:
>
>> On Sat, 27 Aug 2011 09:42:09 -0600, AGWFacts <AGWFacts@ipcc.org>
>> wrote:
>>
>> >On Wed, 24 Aug 2011 12:37:20 -0700, Captain Compassion
>> ><daranc@NOSPAMcharter.net> wrote:
>> >
>> >> The Americas, Not the Middle East, Will Be the World Capital of Energy
>> >
>> >Actually no, China will be the "world capital" on energy.
>> >
>> >> http://www.foreignpolicy.com/articles/2011/08/15/the_americas_not_the_middle_east_will_be_the_world_capital_of_ener...
>
>> The Americas have more coal and shale.
>
>Yes, and getting at it would be disasterous to the country.

And not getting it is national suicide. Realize this. All available
energy in the world will eventually be harvested. This includes
petroleum, coal and gas in all their forms. There will be renewables
as well. There are only two options. A dynamic growing world or a dark
world of ignorance and human suffering.

--
"We can't drive our SUVs and eat as much as we want and keep our
homes on 72 degrees at all times ... and then just expect that other
countries are going to say OK." -- Barack Obama

The object of life is not to be on the side of the majority but to
escape finding oneself in the ranks of the insane. -- Marcus Aurelius

"...the whole world, including the United States, including all that
we have known and cared for, will sink into the abyss of a new Dark
Age, made more sinister, and perhaps more protracted, by the lights
of perverted science." -- Sir Winston Churchill

Joseph R. Darancette
daranc@NOSPAMcharter.net

comp.lang.ruby

accents and String#tr

Xavier Noria

Robin Stocker

AGW Facts

Caravan

Captain Compassion

x Login to ForumsZone