Asp Forum - Detect whether unicode string is Japanese

Bob Marley

9/29/2008 2:19:00 PM

How can I get a tally of how many characters in a Unicode string are
Japanese (hiragana, katakana, kanji)? When I unpack a string, each
character comes out like \xE3\x81\x95, but I am trying to check if it's
in the range 3040-309F (Hiragana) and I don't understand how to convert
between the 3-byte representation and that range...
--
Posted via http://www.ruby-....

1 Answer

Jan Dvorak

9/30/2008 5:16:00 PM

On Monday 29 September 2008 16:18:53 Bob Marley wrote:
> How can I get a tally of how many characters in a Unicode string are
> Japanese (hiragana, katakana, kanji)? When I unpack a string, each
> character comes out like \xE3\x81\x95, but I am trying to check if it's
> in the range 3040-309F (Hiragana) and I don't understand how to convert
> between the 3-byte representation and that range...

You may lookup the unicode mapping on google, but you will have to write new
function for each possible encoding (UTF-8,UTF16LE...).

Or, with ruby 1.9, you can iterate string by characters (not bytes), and
use .ord function to get the unicode position number:

mystr.each_char do |ch|
puts ch.ord
end

Jan

Servizio di avviso nuovi messaggi

Ricevi direttamente nella tua mail i nuovi messaggi per
Detect whether unicode string is Japanese

Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te

Il servizio è completamente GRATUITO!

comp.lang.ruby

Detect whether unicode string is Japanese

Bob Marley

Jan Dvorak

x Login to ForumsZone