[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

problem matching accented chars on OS X

Alex Fenton

6/11/2005 11:09:00 AM

Hi

I'm finding words within strings in Western European languages, so I
need to account
for accented characters, such as ê (e circumflex) and à (a grave). On
ruby 1.8.2
MSW the following works for me (simplified):

WORD_PATTERN = /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/s

\w gets me a-z + A-Z , the hex characters are the positions of the
accented characters in
iso-8859-1 encoding. This seems to work, but when I run the same code on
OS X, I get

.../lib/weft/backend/sqlite.rb:533: mismatch multibyte code length in
char-class range: /^[\w\xC0-\xD6\xD8-\xF6\xF8-\xFF]+$/ (SyntaxError)

Any pointers? I'm not sure what is going wrong.

Is there a library written that can help me matching letter characters
(ideally in a
variety of codesets)? [:alpha:] regex class seeemed to be synonymous
with \w, which
doesn't match enough.

cheers
alex