[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

transform non-english text

Hugo

1/19/2007 9:19:00 AM

Hello,
I have a Ruby aplication that deals with non-english text and I want to
transform some of that text to [^a-zA-Z0-9].
Examples:
búsqueda -> busqueda
presenças -> presencas
für -> fur
avião1 -> aviao1

Call anyone help me?

Thanks.
Best regards,
Migrate

--
Posted via http://www.ruby-....

4 Answers

Martin Boese

1/19/2007 12:22:00 PM

0

I don't think there is a unified mapping table to transform non-[^a-zA-Z0-9]
characters into a specific one of them. But if you can concider to write a
map yourself try something like:


class String
MAP = [[/ü/, 'u'],
[/ö/, 'o']]

def eng_char
res = String.new(self)
MAP.each { |r| res = res.gsub(r[0],r[1]) }
return res
end

end

s = "abücüöö"
puts s + " => " + s.eng_char

----------
Will output:

abücüöö => abucuoo


Martin



On Friday 19 January 2007 09:19, Hu Ma wrote:
> Hello,
> I have a Ruby aplication that deals with non-english text and I want to
> transform some of that text to [^a-zA-Z0-9].
> Examples:
> búsqueda -> busqueda
> presenças -> presencas
> für -> fur
> avião1 -> aviao1
>
> Call anyone help me?
>
> Thanks.
> Best regards,
> Migrate

F. Senault

1/19/2007 1:05:00 PM

0

Le 19 janvier 2007 à 10:19, Hu Ma a écrit :

> Hello,
> I have a Ruby aplication that deals with non-english text and I want to
> transform some of that text to [^a-zA-Z0-9].

You could try with Iconv to convert from your encoding to ASCII. Quick
example :

>> require "iconv"
=> true
>> Iconv.iconv("ascii//translit", "iso-8859-1", "aéioù")
=> ["a'eio`u"]
>> Iconv.iconv("ascii//translit", "iso-8859-1", "aéiou")[0].tr('^a-z', '')
=> "aeiou"

Fred
--
Can you see your days blighted by darkness ?
Is it true you beat your fists on the floor ?
Stuck in a world of isolation
While the ivy grows over the door (Pink Floyd, Lost For Words)

Hugo

1/20/2007 11:03:00 AM

0

Hello,

Thanks for your help.

I will try both approaches to see what fits best.

Best regards,
Migrate


Martin Boese wrote:
> I don't think there is a unified mapping table to transform
> non-[^a-zA-Z0-9]
> characters into a specific one of them. But if you can concider to write
> a
> map yourself try something like:
>
>
> class String
> MAP = [[/ü/, 'u'],
> [/ö/, 'o']]
>
> def eng_char
> res = String.new(self)
> MAP.each { |r| res = res.gsub(r[0],r[1]) }
> return res
> end
>
> end
>
> s = "abücüöö"
> puts s + " => " + s.eng_char
>
> ----------
> Will output:
>
> abücüöö => abucuoo
>
>
> Martin


--
Posted via http://www.ruby-....

Daniel DeLorme

1/23/2007 8:26:00 AM

0

F. Senault wrote:
>>> require "iconv"
> => true
>>> Iconv.iconv("ascii//translit", "iso-8859-1", "aéioù")
> => ["a'eio`u"]
>>> Iconv.iconv("ascii//translit", "iso-8859-1", "aéiou")[0].tr('^a-z', '')
> => "aeiou"

iconv translit is really nice... when it works. It works on our FreeBSD
server but not on my ubuntu dev machine. Your mileage may vary.

Daniel