Dido Sevilla
10/17/2006 6:45:00 AM
On 10/16/06, Minkoo <minkoo.seo@gmail.com> wrote:
> I'm afraid that I'm not used to character encodings. Does Ruby use UTF-8 by
> default?
>
As of Ruby 1.8, it doesn't, but see the other responses.
> In other words, suppose that I've launched irb and fired
> "foo".levenshtein("foobar").
> In that case, is the string "foo" encoded as utf-8?
The code makes that assumption, correct or not. If you're one of those
ignorant yokels who believes that characters are bytes, it's also a
safe assumption. ;) Now, if you're one of those people who does i18n,
l10n, and m17n before breakfast, you'll have alarm bells going off
inside your head immediately, as such assumptions are in general very
dangerous to make. A cardinal rule in this kind of programming is that
strings are meaningless without an attached encoding.
> Do I always have to
> unpack the string like
> the code shown above?
It would be better, at least that keeps your options open when you
attempt to internationalize your program. You wind up supporting
Unicode at the very least, and that makes the transition to
internationalization a bit easier, as Unicode is a reasonable choice
as a character set if one wishes to do internationalization.