Jan Dvorak
9/30/2008 5:16:00 PM
On Monday 29 September 2008 16:18:53 Bob Marley wrote:
> How can I get a tally of how many characters in a Unicode string are
> Japanese (hiragana, katakana, kanji)? When I unpack a string, each
> character comes out like \xE3\x81\x95, but I am trying to check if it's
> in the range 3040-309F (Hiragana) and I don't understand how to convert
> between the 3-byte representation and that range...
You may lookup the unicode mapping on google, but you will have to write new
function for each possible encoding (UTF-8,UTF16LE...).
Or, with ruby 1.9, you can iterate string by characters (not bytes), and
use .ord function to get the unicode position number:
mystr.each_char do |ch|
puts ch.ord
end
Jan