[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Reading Files: how to I specify the encoding ?

Claus Hausberger

5/14/2007 2:39:00 PM

Hello

I have a lot of xml and java files witch have German Umlauts and other
non ASCII files in them.

I want to read the files and convert them to UTF-8 using a Ruby script.

I convert the strings with this code:

def to_utf8(str)
str.unpack('U*').map do |c|
if c < 0x80
c.chr
else
'( u%04X )' % c
end
end.join
end

(taken from "The Ruby Way" by Hal Fulton).

sometimes it works, sometimes I get this error:
"malformed UTF-8 character"

I tought this might happen because the File is encoded in ISO-8859-1
(was written with Eclipse set to ISO-8859-1 for text encoding).

how can I read a file with Ruby and specify that it is read with
ISO-8859-1 encoding (similar to Java's BufferedReader where I can
specify the encoding).

any help welcome. best wishes

Claus

--
Posted via http://www.ruby-....

2 Answers

Alex Young

5/14/2007 2:48:00 PM

0

Claus Hausberger wrote:
> Hello
>
> I have a lot of xml and java files witch have German Umlauts and other
> non ASCII files in them.
>
> I want to read the files and convert them to UTF-8 using a Ruby script.
>
> I convert the strings with this code:
>
> def to_utf8(str)
> str.unpack('U*').map do |c|
I'd be surprised if this was right - you're telling it that you're
expecting the string to be UTF-8 already with that unpack format.

<snip>
> how can I read a file with Ruby and specify that it is read with
> ISO-8859-1 encoding (similar to Java's BufferedReader where I can
> specify the encoding).

Investigate Iconv in the standard library. It does what you need.

--
Alex

Enrique Comba Riepenhausen

5/14/2007 2:49:00 PM

0

On 14 May 2007, at 16:39, Claus Hausberger wrote:

> Hello
>
> I have a lot of xml and java files witch have German Umlauts and other
> non ASCII files in them.
>
> I want to read the files and convert them to UTF-8 using a Ruby
> script.
>
> I convert the strings with this code:
>
> def to_utf8(str)
> str.unpack('U*').map do |c|
> if c < 0x80
> c.chr
> else
> '( u%04X )' % c
> end
> end.join
> end
>
> (taken from "The Ruby Way" by Hal Fulton).
>
> sometimes it works, sometimes I get this error:
> "malformed UTF-8 character"
>
> I tought this might happen because the File is encoded in ISO-8859-1
> (was written with Eclipse set to ISO-8859-1 for text encoding).
>
> how can I read a file with Ruby and specify that it is read with
> ISO-8859-1 encoding (similar to Java's BufferedReader where I can
> specify the encoding).
>
> any help welcome. best wishes
>
> Claus
>
> --
> Posted via http://www.ruby-....
>

Hallo Claus,

you could use jcode...

$KCODE = 'UTF8'
require 'jcode'

Cheers,

Enrique Comba Riepenhausen