Tim Bray
7/31/2006 3:11:00 PM
On Jul 31, 2006, at 7:52 AM, Alex Young wrote:
>>> First, Onigurama[2] is a regular expression engine. It supports
>>> Unicode regular
>>> expressions under many encodings, it's very handy. If all you
>>> want to do is
>>> search strings for Unicode text, then great, use it.
>> Er uh well it doesn't do unicode properties so you can't use
>> things like \p{L}
>
> Off topic, what does/would that do? Match a lower-case symbol?
Unicode characters have named properties. "L" means it's a letter.
There are sub-properties like Lu and Ll for upper and lower case.
There are lots more properties for things like being numbers, being
white-space, combining forms and particular properties of Asian
characters and so on. Tremendously useful in regexes, particularly
for those of us round-eye gringos who are prone to write [a-zA-Z] and
think we're matching letters, which we're not. If you don't support
properties, you don't support Unicode. -Tim