[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

A question about Iconv arguments

Axel Etzold

6/9/2007 7:12:00 PM

Dear all,

I need to convert some accented text, and I would like to know
what arguments I have to give Iconv to produce the desired output.
E.g., in Italian, the word for Friday is "venerdi", where the
"i" carries a dash (small i with grave accent).
If you type this into Wikipedia search in Italian
(which I believed to be in utf-8 encoding),
it will load:

http://it.wikipedia.org/wiki/Ve... ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me "venerd\303\254" when I convert from latin1 encoding.

What arguments do I have to use ?

Thank you,

Best regards,

Axel




--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/g...

6 Answers

Alex Young

6/9/2007 10:34:00 PM

0

Axel Etzold wrote:
> Dear all,
>
> I need to convert some accented text, and I would like to know
> what arguments I have to give Iconv to produce the desired output.
> E.g., in Italian, the word for Friday is "venerdi", where the
> "i" carries a dash (small i with grave accent).
> If you type this into Wikipedia search in Italian
> (which I believed to be in utf-8 encoding),
> it will load:
>
> http://it.wikipedia.org/wiki/Ve... ,
>
> yet this syntax:
>
> converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)
>
> gives me "venerd\303\254" when I convert from latin1 encoding.
That looks right to me - if I write that into a UTF-8 HTML document, it
displays correctly. What are you expecting?

--
Alex

Axel Etzold

6/10/2007 7:52:00 AM

0

Dear Alex,

thank you for responding.
If I try to get a webpage that has accents in its address,
like

> require "rubygems"
> require "rio"
> require 'iconv'
> output_encoding = 'utf-8'
> doc="Venerdì"
> converted_doc = Iconv.new(output_encoding, 'latin1').iconv(doc)
> rio("http://www.wikipedia.org/w... + converted_doc)>rio("a.html")

I get an error message:

/usr/local/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): http://www.wikipedia.org/wiki/venerd&#... (URI::InvalidURIError)
from /usr/local/lib/ruby/1.8/uri/common.rb:485:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/withpath.rb:285:in `uri_from_string_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:74:in `arg0_info_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:83:in `init_from_args_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:56:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/builder.rb:111:in `build'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/factory.rb:412:in `create_state'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:65:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `rio'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/kernel.rb:42:in `rio'


This doesn't happen if I type in:

rio("http://www.wikipedia.org/wiki/Venerd%C...)>rio("a.html")

So I need to know what conversion arguments I need to give Iconv to
turn "Venerdì" into "Venerd%C3%AC".

Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/s...

Axel Etzold

6/10/2007 9:06:00 AM

0

I've managed to solve this problem like this:

require "rubygems"
require "rio"
require 'iconv'


def to_hex(number)
number=number.abs
binary=''
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10
binary<<'A%'
elsif digit==11
binary<<'B%'
elsif digit==12
binary<<'C%'
elsif digit==13
binary<<'D%'
elsif digit==14
binary<<'E%'
elsif digit==15
binary<<'F%'
end
number=(number-digit)/16
end
return binary.reverse.gsub(/%([A-F])%([A-F])/,'%\1\2')
end

class String
def wiki_addr
converted_doc = Iconv.new('utf-8', 'latin1').iconv(self)
res=''
converted_doc.split(//).each{|x|
if /[a-zA-Z0-9\_ ]/.match(x)
res<<x
else
res<<to_hex(x[0])
end
}
return res
end
end


doc ="venerdì"
doc.wiki_addr
rio("http://it.wikipedia.org/wi... doc.wiki_addr)>rio("a.html")

Best regards,

Axel
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/mult...

Stefan Rusterholz

6/10/2007 4:37:00 PM

0

Axel Etzold wrote:
> I've managed to solve this problem like this:
>
> require "rubygems"
> require "rio"
> require 'iconv'
>
>
> def to_hex(number)
> number=number.abs
> binary=''
> while number>0
> digit=number%16
> if digit<10
> binary<<digit.to_s
> elsif digit==10
> ...

I guess you're not aware of neither:
1234.to_s(16)
nor:
"%x" % 1234

For situations like the above, even a lookup-array or a case/when would
be better.

Regards
Stefan

--
Posted via http://www.ruby-....

Axel Etzold

6/10/2007 6:53:00 PM

0

Dear Stefan,

thank you for bringing this to notice!
(Slightly varying Voltaire, I might
have been able to write a shorter
program had I had more leisure and
more knowledge).
I'll try your suggestion.
Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/s...

Nobuyoshi Nakada

6/11/2007 4:47:00 AM

0

Hi,

At Sun, 10 Jun 2007 18:05:49 +0900,
Axel Etzold wrote in [ruby-talk:254981]:
> I've managed to solve this problem like this:

$ ruby -riconv -rcgi -e 'puts CGI.escape(Iconv.conv("utf-8", "latin1", "venerd\354"))'
venerd%C3%AC

--
Nobu Nakada