Asp Forum - Unicode string conversion

Alexandre Rosenfeld

5/5/2007 12:01:00 AM

I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?

--
Posted via http://www.ruby-....

3 Answers

John Joyce

5/5/2007 4:51:00 AM

On May 5, 2007, at 9:01 AM, Alexandre Rosenfeld wrote:

> I'm reading a binary file in my program. It contains strings in the
> Windows Unicode format, which it says is stored as little-endian in
> the
> spefication. I'm loading it and trying to convert using Iconv, but I'm
> getting a invalid character exception, on any string. Now I'm just
> stripping the \000 character from it and it works, but I know it's not
> an ideal solution and it only works in some cases.
> So, how can I get the string in a format Ruby can understand? By the
> way, I'll load these string in GTK (with Ruby bindings), anyone
> knows if
> it can show Unicode strings?
>
> --
> Posted via http://www.ruby-....
>
Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.

Alexandre Rosenfeld

5/6/2007 2:06:00 PM

John Joyce wrote:
> On May 5, 2007, at 9:01 AM, Alexandre Rosenfeld wrote:
>
>> it can show Unicode strings?
>>
>> --
>> Posted via http://www.ruby-....
>>
> Stripping the BOM? (byte order mark)
> Should be fine. Unicode works just as well w/ no BOM, actually better
> with no BOM.
> The first thing you should check for though is the presence of the
> BOM and read the BOM.

There is no BOM. The specifications clearly states it "uses UTF-16,
little endian, and the Byte-Order Marker (BOM) character is not present"

What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?

--
Posted via http://www.ruby-....

Nobuyoshi Nakada

5/8/2007 11:52:00 PM

Hi,

At Sun, 6 May 2007 23:05:37 +0900,
Alexandre Rosenfeld wrote in [ruby-talk:250503]:
> What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
> for the BOM, even when I specify UTF16LE, which would make it explicit
> the byte order?

BOM is a "ZERO WIDTH NON-BREAKING SPACE" at the beginning of
a text. Almost iconv(3) should be possible to deal with it.
Can't you show minimal data to reproduce the error?

--
Nobu Nakada

comp.lang.ruby

Unicode string conversion

Alexandre Rosenfeld

John Joyce

Alexandre Rosenfeld

Nobuyoshi Nakada

x Login to ForumsZone