ciapecki
12/6/2006 5:56:00 PM
> On Wed, 06 Dec 2006 12:01:37 -0000, ciapecki <ciapecki@gmail.com> wrote:
> > David Vallner schrieb:
> >> Ross Bamford wrote:
> >> > On Mon, 2006-12-04 at 22:40 +0900, ciapecki wrote:
> >> >> Is there a way in ruby to:
> >> >> - open a file encoded in ucs-2le,
> >> >> - replace every occurance of '\t' (X'0009') with ',' (X'002c'),
> >> >> - and save it back in ucs-2le, without loosing any content?
> >> > But that strikes me as unnecessary when you could just do:
> >> >
> >> > newdata = File.read('test').tr("\t", ',')
> >> > # => "a\000b\000c\000,\000\273\006,\0001\000"
> >> >
> >>
> >> Um. Other way around. *Old* data is in UCS-2LE, not in UTF-8, so it's
> >> not ASCII-transparent. Your iconv approach could work if you swapped
> >> around the encoding names, except you'd probably also have to involve a
> >> $KCODE = 'u' and require 'jcode' to avoid clobbering the possible cases
> >> where in UTF8, 0x09 and 0x2c are part of a multibyte sequence.
> >>
> >
> > Thanks Ross for the try, but it is not working,
> > tried for:
> >
> > "\377\376B\001\363\000|\001k\000o\000\t\000k\000s\000i\000\005\001|\001k\000a\000\t\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000\t\000\t\000|\001d\000z\001b\000B\001o\000\r\000\n\000"
> > which is:
> >
> > lózko ksiazka czlowiek
> > laka zdzblo
> >
> > -> (the same :))
> >
> > the conversion should be:
> > lózko,ksiazka,czlowiek
> > laka,,zdzblo
> >
> > but with the Iconv try:
> > lózko,ksiazka,czlowiek
> > ???????????????
> >
> > after swapping utf-8 to ucs-2le in the both iconv convertions, I get an
> > error message:
> > `iconv': "\377\376B\001¾ |?k\000o\000\t\000k\000"...
> > (Iconv::IllegalSequence)
> >
> >
> > Any other suggestions highly appreciated.
> >
>
> I think David is confusing the order of the 'from' and 'to' arguments to
> Iconv.iconv - they go: (to, from, data). My short example was
> ill-conceived, though - this might be safer:
>
> $ irb -riconv
>
> s = <the string you show above>
>
> s.gsub(/\t\000(?!\000)/, ",\000")
> # =>
> "\377\376B\001\363\000|\001k\000o\000,\000k\000s\000i\000\005\001|\001k\000a\000,\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000,\000,\000|\001d\000z\001b\000B\001o\000\r\000\n\000"
>
> (This is:
>
> lózko,ksiazka,czlowiek
> laka,,zdzblo
> )
>
> But I'm not totally sure, so you might be better with iconv anyway:
>
> Iconv.iconv('ucs-2le', 'utf-8', Iconv.iconv('utf-8','ucs-2le',
> s).first.gsub(/\t/u, ',')).first
> # =>
> "\377\376B\001\363\000|\001k\000o\000,\000k\000s\000i\000\005\001|\001k\000a\000,\000c\000z\000B\001o\000w\000i\000e\000k\000\r\000\n\000B\001\005\001k\000a\000,\000,\000|\001d\000z\001b\000B\001o\000\r\000\n\000"
>
> (This too is:
>
> lózko,ksiazka,czlowiek
> laka,,zdzblo
> )
>
> Unless I missed something, this seems to work fine here. Does it work for
> you?
>
> --
> Ross Bamford - rosco@roscopeco.remove.co.uk
Thanks Ross,
I was that stupid and forgot to open the writable file as binary "wb"
(before I had "w" only)
Thanks again for your help
chris