Paul Battley
7/6/2007 3:55:00 PM
Hi,
On 05/07/07, johan556@gmail.com <johan556@gmail.com> wrote:
> The problematic files are stored with a name that is a 16-bit
> character string in NTFS (what I called Unicode in my earlier mail,
> perhaps one should call it "almost UTF-16" or UCS-2, I don't know the
> finer details). Anyway, I don't think setting KCODE solves my problem.
I haven't used Windows for a long while, but unless something has
changed in the newest releases, Ruby uses the Windows legacy code page
for interacting with the system, which is by default Windows-1252 on
English systems, Shift_JIS on Japanese systems, etc.
Internally, Windows is all Unicode, as is NTFS (I think it's UTF-16,
but that's not really important for this discussion), but applications
using legacy code pages can't communicate strings outside that code
page to the OS.
That means that if you set the legacy code page to Shift_JIS, you can
read and write Japanese file names, but not Arabic ones. If you set it
to Windows-1252, you can use acute accents, but can't touch Japanese
files.
I am led to believe that there is a UTF-8 code page in Windows, and it
is possible to set the legacy code page on an
application-by-application basis, at least on XP (though you might
need a separate Power Toy or similar to do it). If you can get that to
work, it might be possible to manipulate files via the UTF-8
representation of their name. I've never seen it done, though, so this
is entirely hypothetical.
Paul.