[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Help with Iconv needed

Marcus Strube

11/29/2007 12:51:00 PM

Can someone tell me what it is that I'm getting wrong here with "iconv"?
I either get "IllegalSequence" or "äöü�" are not encoded properly when
using Iconv.conv while it looks good using backticks. ("IllegalSequence
right now with the second. ��ü with the first anytime...)

require 'rss/1.0'; require 'rss/2.0'; require 'open-uri'; require
"iconv"

#source = "http://www.sueddeutsche.de/app/service/rss/alles/rss...
source = "http://www.welt.de/vermischtes/?service...

content = ""; open(source) { |s| content = s.read }; rss =
RSS::Parser.parse(content, false)

rss.items.each do |item|
converted = `'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`
puts(Iconv.conv('ISO-8859-1', 'UTF-8', item.title)); puts " "
end
--
Posted via http://www.ruby-....

1 Answer

MonkeeSage

11/30/2007

0

On Nov 29, 6:50 am, Marcus Strube <marcus.str...@gmx.net> wrote:
> Can someone tell me what it is that I'm getting wrong here with "iconv"?
> I either get "IllegalSequence" or "äöüß" are not encoded properly when
> using Iconv.conv while it looks good using backticks. ("IllegalSequence
> right now with the second. ÄÖü with the first anytime...)
>
> require 'rss/1.0'; require 'rss/2.0'; require 'open-uri'; require
> "iconv"
>
> #source = "http://www.sueddeutsche.de/app/service/rss/alles/rss...
> source = "http://www.welt.de/vermischtes/?service...
>
> content = ""; open(source) { |s| content = s.read }; rss =
> RSS::Parser.parse(content, false)
>
> rss.items.each do |item|
> converted = `'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`
> puts(Iconv.conv('ISO-8859-1', 'UTF-8', item.title)); puts " "
> end
> --
> Posted viahttp://www.ruby-....

Not sure about the error, but I see two issues. First, this is an
error...

`'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`

I think you meant to echo the vale to the pipe...

`echo -n '#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8`

Second, iso-8859-1 to utf-8 doesn't appear to be the proper encoding.
The following string...

Düsseldorf: Prominentengedrängel bei der Bambi-Verleihung

...is encoded as...

"D\303\203\302\274sseldorf: Prominentengedr\303\203\302\244ngel bei
der Bambi-Verleihung"

...by iconv from the command prompt. But it should be...

"D\303\274sseldorf: Prominentengedr\303\244ngel bei der Bambi-
Verleihung"

I'm not good with encodings and utf-8, so I can't tell you the
problem. I just know "umlaut u" should be 0xc3bc (\303\274), but it's
not doing that.

Regards,
Jordan