[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Premature end of regular expression with non-ascii character

Nick Snels

1/29/2006 8:54:00 PM

Hi,

I'm trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kiné weet één van hun patiënten te overtuigen om gekke dingen te
doen.

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error 'premature end of regular expression:
/kiné/ (SyntaxError)'. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between 'kiné weet' is not highlighted as a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.

Kind regards,

Nick

--
Posted via http://www.ruby-....


5 Answers

Matthew Smillie

1/29/2006 9:14:00 PM

0

> I'm trying to get regular expressions to work with a string that
> contains letters with accents. I have the following sentence:
>
> De kiné weet één van hun patiënten te overtuigen om gekke dingen te
> doen.
>
> The regexp /patiënten/ matches the word patiënten. However when I
> do the
> regexp /kiné/, I get the error 'premature end of regular expression:
> /kiné/ (SyntaxError)'. Can anybody tell me what is going on? Another
> issue with the same sentence is, when I use the regexp /\s/ to
> highlight
> all the spaces, the space between 'kiné weet' is not highlighted as a
> space. It seems like regular expressions cann't handle non-ascii
> characters at the end of a string.


I believe this is a character encoding problem which is fixed in 1.9
by the inclusion of a new regular expression engine (Which you can
also download and use in 1.8):

http://www.geocities.jp/kosako3/...

Best of luck.
matt.


Dave Burt

1/30/2006 12:16:00 AM

0

Nick Snels asked:
> I'm trying to get regular expressions to work with a string that
> contains letters with accents. ...
>
> The regexp /patiënten/ matches the word patiënten. However when I do the
> regexp /kiné/, I get the error 'premature end of regular expression:
> /kiné/ (SyntaxError)'. Can anybody tell me what is going on?

You might avoid the syntax error by setting $KCODE = "u" at the start of
your program.

> Another
> issue with the same sentence is, when I use the regexp /\s/ to highlight
> all the spaces, the space between 'kiné weet' is not highlighted as a
> space. It seems like regular expressions cann't handle non-ascii
> characters at the end of a string.

Ruby strings are made up of bytes, not characters. That's the cause of the
issues you're having. There are a couple of recent plugins for Ruby to help
improve the situation (see
http://redhanded.hobix.com/inspect/unicodeLibForR...) but they're far
from perfect.

I hope $KCODE can clear up most of your problems, though.

Cheers,
Dave


Logan Capaldo

1/30/2006 4:11:00 AM

0


On Jan 29, 2006, at 3:53 PM, Nick Snels wrote:

> Hi,
>
> I'm trying to get regular expressions to work with a string that
> contains letters with accents. I have the following sentence:
>
> De kiné weet één van hun patiënten te overtuigen om gekke dingen te
> doen.
>
> The regexp /patiënten/ matches the word patiënten. However when I
> do the
> regexp /kiné/, I get the error 'premature end of regular expression:
> /kiné/ (SyntaxError)'. Can anybody tell me what is going on? Another
> issue with the same sentence is, when I use the regexp /\s/ to
> highlight
> all the spaces, the space between 'kiné weet' is not highlighted as a
> space. It seems like regular expressions cann't handle non-ascii
> characters at the end of a string.
>
> Kind regards,
>
> Nick
>
> --
> Posted via http://www.ruby-....
>

Are you using $KCODE="u" at the top of your script?



Nick Snels

1/30/2006 8:18:00 PM

0

Thank you both very much for the suggestions. First off I have
$KCODE="u" in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

Secondly I looked at oniguruma and I must say it looks promising.
Unfortunately for me and my Windows (Cygwin) machine I have to compile
it into Ruby 1.8.2-1.8.4. And I cann't get it to work. Cann't get 1.8.2
to compile, an error which you then solve, yet another error and so one.
Hopeless. I managed to compile 1.8.4 but when I open Ruby I get the
error that a file is missing. I'm using the Windows one-click Ruby
installer if anybody is wondering how on earth I managed to get Ruby
working :). I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don't know if Rails works with it. I have
contacted the author of oniguruma, maybe he can be conclusive as to
whether or not oniguruma solves my problem. When I get a response I'll
post it here. In the mean time if anybody has any other suggestions,
please let me hear. Thanks.

Kind regards,

Nick


--
Posted via http://www.ruby-....


Dave Burt

1/30/2006 10:11:00 PM

0

Nick Snels wrote:
> Thank you both very much for the suggestions. First off I have
> $KCODE="u" in config/environment.rb (Rails). I have also tried to add it
> into the class. But the error remained.

I haven't had the issues you're talking about, because I'm only doing apps
in English, but here are a couple of places you might start to look for
solutions:

http://wiki.rubyonrails.com/rails/pages/HowToUseUnic...

http://redhanded.hobix.com/inspect/unicodeLibForR...

> I could use 1.9.0 because this includes oniguruma. The only
> problem here is that I don't know if Rails works with it.

Don't. 1.9.0 isn't for production, really; it's an experimental version
which is growing some features that may become part of Ruby 2.0.

Cheers,
Dave