[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

replacing diacritics by simple character

unbewusst.sein

9/25/2007 4:26:00 PM

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?



--
Une Bévue
9 Answers

F. Senault

9/25/2007 4:50:00 PM

0

Le 25 septembre à 18:25, Une Bévue a écrit :

(Hello again... :) )

> do u know of a way to replace diacritics by simple character (ie. : é
> -o-> e)
>
> the same with ligatures (ie. : Æ -o-> AE )
>
> using tables ?

IConv can do that for you :

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")
=> #<Iconv:0x84d4448>
>> i.iconv("aéouï Æ")
=> "a'eou"i AE"
>> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
=> "aeoui AE"

Fred
--
I've found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker
brings you said instrument. Suddenly, no more paper jams.
(Kai Henningsen in the SDM)

Michal Suchanek

9/25/2007 6:13:00 PM

0

> --
> I've found an axe can do a lot for a paper-mangling printer. Especially
> if you shout for one at the top of your voice, and then a cow orker
--------------------------------------------------------------------------------------^
???
> brings you said instrument. Suddenly, no more paper jams.
> (Kai Henningsen in the SDM)
>
>

:D

unbewusst.sein

9/25/2007 6:57:00 PM

0

F. Senault <fred@lacave.net> wrote:

> IConv can do that for you :
>
> >> require "iconv"
> => true
> >> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")
> => #<Iconv:0x84d4448>
> >> i.iconv("aéouï Æ")
> => "a'eou"i AE"
> >> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
> => "aeoui AE"

Fine thanks a lot Fred à c't'heure ;-)

Have a good wine celler ;-)

ça marche même avec de l'UTF-8

works also with UTF-8
--
Une Bévue

Petite Abeille

9/25/2007 7:26:00 PM

0


On Sep 25, 2007, at 18:55, F. Senault wrote:

>> do u know of a way to replace diacritics by simple character (ie. : é
>> -o-> e)
>>
>> the same with ligatures (ie. : Æ -o-> AE )
>>
>> using tables ?
>
> IConv can do that for you :

An alternative approach is something like Sean M. Burke's
Text::Unidecode:

http://interglacial.com/~sburke/tpj/as_html/...
http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Un...


Here is an example of an implementation of Unidecode in Lua [1]:

local Unidecode = require( 'Unidecode' )

print( Unidecode( '??????´' ) )
print( Unidecode( '??' ) )
print( Unidecode( '?????' ) )
print( Unidecode( '??' ) )
print( Unidecode( '??' ) )
print( Unidecode( '???' ) )
print( Unidecode( '?????' ) )
print( Unidecode( '???????????-?????' ) )
print( Unidecode( '???? ??????? ??????' ) )
print( Unidecode( '?????' ) )
print( Unidecode( 'Géometrie Différentielle' ) )

> Moskva
> beijing
> Athena
> seoul
> dongjing
> jingdushi
> nepaal
> te'labiyb-yapvo
> tal 'abiyb yaafaa
> thran
> Geometrie Differentielle

Cheers,

PA.

[1] http://dev.alt.textdrive.com/browser/HTTP/Uni...

F. Senault

9/25/2007 7:36:00 PM

0

Le 25 septembre à 20:12, Michal Suchanek a écrit :

>> --
>> I've found an axe can do a lot for a paper-mangling printer. Especially
>> if you shout for one at the top of your voice, and then a cow orker
> --------------------------------------------------------------------------------------^
> ???

It's intentional. Cow orker was probably a typo in the olden times, but
has entered the mainstream since then. Just ask google : "Results 1 -
10 of about 37,200 for "cow orker". (0.19 seconds)" :)

Fred
--
I feel it move across my skin. I'm reaching up and reaching out, I'm
reaching for the random or what ever will bewilder me. And following
our will and wind we may just go where no one's been. We'll ride the
spiral to the end and may just go where no one's been. (Tool, Lateralus)

Daniel DeLorme

9/25/2007 10:00:00 PM

0

F. Senault wrote:
> IConv can do that for you :
>
>>> require "iconv"
> => true
>>> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")
> => #<Iconv:0x84d4448>
>>> i.iconv("aéouï Æ")
> => "a'eou"i AE"
>>> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
> => "aeoui AE"

That doesn't work on all platforms. For me:

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
=> #<Iconv:0xb7cf28e0>
>> i.iconv("aéouï Æ")
=> "a?ou? AE"

:-(

unbewusst.sein

9/25/2007 10:25:00 PM

0

Daniel DeLorme <dan-ml@dan42.com> wrote:

>
> That doesn't work on all platforms. For me:
>
> >> require "iconv"
> => true
> >> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
> => #<Iconv:0xb7cf28e0>
> >> i.iconv("aéouï Æ")
> => "a?ou? AE"
>
> :-(

Are u sure about the encoding of "aéouï Æ" ?

because i did it with UTF-8, it works :

-- the script ----------------------------------------------------------
#! /usr/bin/env ruby

require "iconv"

i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

p i.iconv("aéouï Æ")
# => "a'eou\"i AE"

p i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
# => "aeoui AE"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß du
?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß
du?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"
------------------------------------------------------------------------
--
Une Bévue

Daniel DeLorme

9/25/2007 10:59:00 PM

0

Une Bévue wrote:
> Daniel DeLorme <dan-ml@dan42.com> wrote:
>
>> That doesn't work on all platforms. For me:
>>
>> >> require "iconv"
>> => true
>> >> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
>> => #<Iconv:0xb7cf28e0>
>> >> i.iconv("aéouï Æ")
>> => "a?ou? AE"
>>
>> :-(
>
> Are u sure about the encoding of "aéouï Æ" ?

yep.

>> str = "aéouï Æ"
=> "a\303\251ou\303\257 \303\206" #(that's utf8 allright)
>> i.iconv(str)
=> "a?ou? AE"

but like I said, translit doesn't work the same on all platforms (I'm on
ubuntu btw)

Daniel

unbewusst.sein

9/26/2007 12:54:00 AM

0

Daniel DeLorme <dan-ml@dan42.com> wrote:

> but like I said, translit doesn't work the same on all platforms (I'm on
> ubuntu btw)

i'm running Mac OS X 10.4.10...
--
Une Bévue