[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Iconv problem - not handling \r correctly

Louise Rains

10/26/2008 9:49:00 PM

I have an XML file that I need to process. I'm working in the Windows
environment. Here is the head of the file:

0000000000   � < \0 A \0 u \0 t \0 o \0 S \0 t
\0
0000000020 a \0 t \0 \0 J \0 a \0 v \0 a \0 C
\0
0000000040 l \0 a \0 s \0 s \0 = \0 " \0 c \0 o
\0
0000000060 m \0 . \0 a \0 u \0 t \0 o \0 s \0 i
\0
0000000100 m \0 . \0 a \0 s \0 t \0 . \0 a \0 u
\0
0000000120 t \0 o \0 m \0 o \0 d \0 . \0 A \0 M
\0
0000000140 H \0 e \0 a \0 d \0 e \0 r \0 " \0 >
\0
0000000160 \r \0 \n \0 \t \0 < \0 B \0 a \0 s \0 e
\0
0000000200 A \0 u \0 t \0 o \0 S \0 t \0 a \0 t
\0
0000000220 \0 J \0 a \0 v \0 a \0 C \0 l \0 a
\0

Notice the character sequence \r \0 \n \0.

I need to edit some of the text elements in this file. I have used both
REXML and Hpricot to edit the file successfully, after converting to
UTF-8. Here is the head of the UTF-8 file:

0000000000 < A u t o S t a t J a v a C
l
0000000020 a s s = ' c o m . a u t o s i
m
0000000040 . a s t . a u t o m o d . A M
H
0000000060 e a d e r ' > \r \n \t < B a s e
A
0000000100 u t o S t a t J a v a C l a
s
0000000120 s = ' c o m . a u t o s i m .
a
0000000140 s t . a u t o m o d . A M H e
a
0000000160 d e r ' S a v e F i l e V e
r
0000000200 s i o n = ' 1 . 3 ' > \r \n \t \t
<
0000000220 P r o p e r t i e s J a v a
C

Notice that \r \n shows up in the next to last line.

Now in order for the edited XML file to work with my original
application, I need to convert back to UTF-16. Here is the code that I
use:

file = File.read("sta_utf8.xml")
conv = Iconv.new("UTF-16LE", "UTF-8")
result = conv.iconv(file);
result= 0xFF.chr << 0xFE.chr << result
file = File.new("sta_utf16.xml", "w")
file.write(result)
file.close

The resulting file (sta_utf16.xml) looks like:

0000000000   � < \0 A \0 u \0 t \0 o \0 S \0 t
\0
0000000020 a \0 t \0 \0 J \0 a \0 v \0 a \0 C
\0
0000000040 l \0 a \0 s \0 s \0 = \0 ' \0 c \0 o
\0
0000000060 m \0 . \0 a \0 u \0 t \0 o \0 s \0 i
\0
0000000100 m \0 . \0 a \0 s \0 t \0 . \0 a \0 u
\0
0000000120 t \0 o \0 m \0 o \0 d \0 . \0 A \0 M
\0
0000000140 H \0 e \0 a \0 d \0 e \0 r \0 ' \0 >
\0
0000000160 \r \n \0 \t \0 < \0 B \0 a \0 s \0 e \0
A
0000000200 \0 u \0 t \0 o \0 S \0 t \0 a \0 t \0
0000000220 \0 J \0 a \0 v \0 a \0 C \0 l \0 a \0
s


Notice that the \r does not have a \0 following it. This means that
every other line in my sta_utf16.xml file is in the wrong byte order and
I get garbled results:

<AutoStat
JavaClass='com.autosim.ast.automod.AMHeader'>਍à¤?ã°?ä??æ??ç??æ??ä??ç??ç?æ¼?å??ç?æ??ç?â??ä¨?æ??ç??æ??ä??æ°?æ??ç??ç??ã´?â??æ??æ¼?æ´?â¸?æ??ç??ç?æ¼?ç??æ¤?æ´?â¸?æ??ç??ç?â¸?æ??ç??ç?æ¼?æ´?æ¼?æ?â¸?ä??ä´?ä ?æ??æ??æ?æ??ç??â??â??å??æ??ç??æ??ä??æ¤?æ°?æ??å??æ??ç??ç??æ¤?æ¼?æ¸?ã´?â??ã??â¸?ã??â??ã¸?à´?

Is this a defect in Iconv?

Thanks,
LG
--
Posted via http://www.ruby-....

1 Answer

Louise Rains

10/27/2008 11:12:00 AM

0

It looks like IO.binmode does the same thing as well:

file = File.new("sta_utf16.xml", "wb")
file.binmode
file.write(result)
file.close

Thanks!

>

Heesob Park wrote:

> No, it's a defect not in Iconv but in Windows.
>
> Use binary flag for file handling like this:
>
> file = File.open("sta_utf8.xml","rb").read
> conv = Iconv.new("UTF-16LE", "UTF-8")
> result = conv.iconv(file);
> result= 0xFF.chr << 0xFE.chr << result
> file = File.new("sta_utf16.xml", "wb")
> file.write(result)
> file.close
>
>
> Regards,
>
> Park Heesob

--
Posted via http://www.ruby-....