Timo Hoepfner
1/13/2006 9:40:00 AM
>> How can I convert the string to UTF8?
>
> You have got a corrent UTF-8 string. Unlike Windows XP, Mac OS X
> decomposes character components as much as possible (Sorry I forgot
> the correct term for this policy). So what you got:
>
>> # => ["a", "", "o", "", "u", "", "ß", "A", "", "O", "", "U", ""]
>
> is decomposed form of your string, a+umlaut, o+umlaut, etc.
Hi Matz, Austin and A.
Thanks for the clarification. Unicode is more comlex than it seems in
the first place...
Nevertheless that doesn't solve my current problem. What I'm trying
to do is to organize files within a directory into subfolders based
on the first N characters of the file name. Here's my code (w/o error
handling) which works fine for 8bit characters, but doesn't work for
e.g. umlauts:
$KCODE='UTF8'
require 'jcode'
require 'pathname'
require 'fileutils'
wd, len = Pathname.new(ARGV[0]), ARGV[1].to_i
files=wd.children.reject{|f| f.directory?}
files.each do |f|
dir = wd + Pathname.new(f.basename.to_s.split(//)[0..len-1].join)
dir.mkdir unless dir.exist?
FileUtils.mv f, dir
end
I guess I have to recompose the decomposed filename somehow. Are
there any tools for that in the standard library or somewhere else?
Thanks for your help,
Timo