[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Detect whether unicode string is Japanese

Bob Marley

9/29/2008 2:19:00 PM

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...
--
Posted via http://www.ruby-....

1 Answer

Jan Dvorak

9/30/2008 5:16:00 PM

0

On Monday 29 September 2008 16:18:53 Bob Marley wrote:
> How can I get a tally of how many characters in a Unicode string are
> Japanese (hiragana, katakana, kanji)? When I unpack a string, each
> character comes out like \xE3\x81\x95, but I am trying to check if it's
> in the range 3040-309F (Hiragana) and I don't understand how to convert
> between the 3-byte representation and that range...

You may lookup the unicode mapping on google, but you will have to write new
function for each possible encoding (UTF-8,UTF16LE...).

Or, with ruby 1.9, you can iterate string by characters (not bytes), and
use .ord function to get the unicode position number:

mystr.each_char do |ch|
puts ch.ord
end

Jan