[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

regexp to match CJK characters

Cafe Babe

10/28/2006 9:25:00 AM

How can I write a regexp to match CJK characters?
Thanks in advance:)

--
Posted via http://www.ruby-....

9 Answers

Paul Lutus

10/28/2006 4:24:00 PM

0

Cafe Babe wrote:

> How can I write a regexp to match CJK characters?
> Thanks in advance:)

print "Yes!" if varname =~ /^CJK$/

If this is not what you wanted, you will simply have to write a longer post.

--
Paul Lutus
http://www.ara...

David Vallner

10/28/2006 4:34:00 PM

0

Paul Lutus wrote:
> Cafe Babe wrote:
>
>> How can I write a regexp to match CJK characters?
>> Thanks in advance:)
>
> print "Yes!" if varname =~ /^CJK$/
>
> If this is not what you wanted, you will simply have to write a longer post.
>

CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David Vallner

Cafe Babe

10/28/2006 5:05:00 PM

0

David Vallner wrote:
> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
>
> David Vallner

Yes, so how can write the regexp? thanks a lot


--
Posted via http://www.ruby-....

Josef 'Jupp' Schugt

10/28/2006 8:11:00 PM

0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe Babe wrote:
| David Vallner wrote:
|> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

Cafe Babe

10/29/2006 1:57:00 AM

0

Josef 'Jupp' Schugt wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Cafe Babe wrote:
> | David Vallner wrote:
> |> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
> |> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
> | Yes, so how can write the regexp? thanks a lot
>
> Which encoding?
>
> Jupp
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
>
> iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
> 8XDVfOVp/F/MbhPx/6MitxA=
> =8zOn
> -----END PGP SIGNATURE-----

UTF-8

and

$KCODE='u'
require_dependency 'jcode',

thanks


--
Posted via http://www.ruby-....

Dido Sevilla

10/29/2006 3:27:00 PM

0

On 10/29/06, Cafe Babe <0xcafebabe@163.com> wrote:
> UTF-8
>
> and
>
> $KCODE='u'
> require_dependency 'jcode',

You may need to use the Oniguruma patch. I believe this is necessary
to give regular expressions support for character sets other than
plain ASCII.

http://www.geocities.jp/kosako3/...

If you're using Gentoo, all you need to do is remerge Ruby with the
cjk use flag turned on. For other systems, you may need to download
and apply the patch manually. See the Oniguruma site for more details.
If you're using a 1.9 Ruby, Oniguruma is already built-in.

Yukihiro Matsumoto

10/30/2006 2:30:00 AM

0

Hi,

In message "Re: regexp to match CJK characters"
on Mon, 30 Oct 2006 00:26:49 +0900, "Dido Sevilla" <dido.sevilla@gmail.com> writes:

|You may need to use the Oniguruma patch. I believe this is necessary
|to give regular expressions support for character sets other than
|plain ASCII.

Regular expression comes with 1.8 does support UTF-8.

matz.

Kev Jackson

10/30/2006 3:33:00 AM

0

> Regular expression comes with 1.8 does support UTF-8.

does this mean though that you must do a match on an escaped character
(\u1234 or on a 'real' character?)

Kev

Yukihiro Matsumoto

10/30/2006 3:42:00 AM

0

Hi,

In message "Re: regexp to match CJK characters"
on Mon, 30 Oct 2006 12:33:08 +0900, "Kevin Jackson" <foamdino@gmail.com> writes:

|> Regular expression comes with 1.8 does support UTF-8.
|
|does this mean though that you must do a match on an escaped character
|(\u1234 or on a 'real' character?)

You don't have to escape, if you specify -Ku or $KCODE='u'.

matz.