Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
Iconv problem - not handling \r correctly
Louise Rains
10/26/2008 9:49:00 PM
I have an XML file that I need to process. I'm working in the Windows
environment. Here is the head of the file:
0000000000 Â â? < \0 A \0 u \0 t \0 o \0 S \0 t
\0
0000000020 a \0 t \0 \0 J \0 a \0 v \0 a \0 C
\0
0000000040 l \0 a \0 s \0 s \0 = \0 " \0 c \0 o
\0
0000000060 m \0 . \0 a \0 u \0 t \0 o \0 s \0 i
\0
0000000100 m \0 . \0 a \0 s \0 t \0 . \0 a \0 u
\0
0000000120 t \0 o \0 m \0 o \0 d \0 . \0 A \0 M
\0
0000000140 H \0 e \0 a \0 d \0 e \0 r \0 " \0 >
\0
0000000160 \r \0 \n \0 \t \0 < \0 B \0 a \0 s \0 e
\0
0000000200 A \0 u \0 t \0 o \0 S \0 t \0 a \0 t
\0
0000000220 \0 J \0 a \0 v \0 a \0 C \0 l \0 a
\0
Notice the character sequence \r \0 \n \0.
I need to edit some of the text elements in this file. I have used both
REXML and Hpricot to edit the file successfully, after converting to
UTF-8. Here is the head of the UTF-8 file:
0000000000 < A u t o S t a t J a v a C
l
0000000020 a s s = ' c o m . a u t o s i
m
0000000040 . a s t . a u t o m o d . A M
H
0000000060 e a d e r ' > \r \n \t < B a s e
A
0000000100 u t o S t a t J a v a C l a
s
0000000120 s = ' c o m . a u t o s i m .
a
0000000140 s t . a u t o m o d . A M H e
a
0000000160 d e r ' S a v e F i l e V e
r
0000000200 s i o n = ' 1 . 3 ' > \r \n \t \t
<
0000000220 P r o p e r t i e s J a v a
C
Notice that \r \n shows up in the next to last line.
Now in order for the edited XML file to work with my original
application, I need to convert back to UTF-16. Here is the code that I
use:
file = File.read("sta_utf8.xml")
conv = Iconv.new("UTF-16LE", "UTF-8")
result = conv.iconv(file);
result= 0xFF.chr << 0xFE.chr << result
file = File.new("sta_utf16.xml", "w")
file.write(result)
file.close
The resulting file (sta_utf16.xml) looks like:
0000000000 Â â? < \0 A \0 u \0 t \0 o \0 S \0 t
\0
0000000020 a \0 t \0 \0 J \0 a \0 v \0 a \0 C
\0
0000000040 l \0 a \0 s \0 s \0 = \0 ' \0 c \0 o
\0
0000000060 m \0 . \0 a \0 u \0 t \0 o \0 s \0 i
\0
0000000100 m \0 . \0 a \0 s \0 t \0 . \0 a \0 u
\0
0000000120 t \0 o \0 m \0 o \0 d \0 . \0 A \0 M
\0
0000000140 H \0 e \0 a \0 d \0 e \0 r \0 ' \0 >
\0
0000000160 \r \n \0 \t \0 < \0 B \0 a \0 s \0 e \0
A
0000000200 \0 u \0 t \0 o \0 S \0 t \0 a \0 t \0
0000000220 \0 J \0 a \0 v \0 a \0 C \0 l \0 a \0
s
Notice that the \r does not have a \0 following it. This means that
every other line in my sta_utf16.xml file is in the wrong byte order and
I get garbled results:
<AutoStat
JavaClass='com.autosim.ast.automod.AMHeader'>à¨à¤?ã°?ä??æ??ç??æ??ä??ç??ç?æ¼?å??ç?æ??ç?â??ä¨?æ??ç??æ??ä??æ°?æ??ç??ç??ã´?â??æ??æ¼?æ´?â¸?æ??ç??ç?æ¼?ç??æ¤?æ´?â¸?æ??ç??ç?â¸?æ??ç??ç?æ¼?æ´?æ¼?æ?â¸?ä??ä´?ä ?æ??æ??æ?æ??ç??â??â??å??æ??ç??æ??ä??æ¤?æ°?æ??å??æ??ç??ç??æ¤?æ¼?æ¸?ã´?â??ã??â¸?ã??â??ã¸?à´?
Is this a defect in Iconv?
Thanks,
LG
--
Posted via
http://www.ruby-...
.
1 Answer
Louise Rains
10/27/2008 11:12:00 AM
0
It looks like IO.binmode does the same thing as well:
file = File.new("sta_utf16.xml", "wb")
file.binmode
file.write(result)
file.close
Thanks!
>
Heesob Park wrote:
> No, it's a defect not in Iconv but in Windows.
>
> Use binary flag for file handling like this:
>
> file = File.open("sta_utf8.xml","rb").read
> conv = Iconv.new("UTF-16LE", "UTF-8")
> result = conv.iconv(file);
> result= 0xFF.chr << 0xFE.chr << result
> file = File.new("sta_utf16.xml", "wb")
> file.write(result)
> file.close
>
>
> Regards,
>
> Park Heesob
--
Posted via
http://www.ruby-...
.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
Iconv problem - not handling \r correctly
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password