Asp Forum - Unicode - comp.lang.ruby

Zephyr Pellerin

9/15/2007 2:00:00 AM

I hate to discuss something related to the development timeline, I know
its tenable, but When will it be reasonable to expect Unicode support
from Ruby?

22 Answers

James Gray

9/15/2007 2:29:00 AM

On Sep 14, 2007, at 9:05 PM, Zephyr Pellerin wrote:

> I hate to discuss something related to the development timeline, I
> know its tenable, but When will it be reasonable to expect Unicode
> support from Ruby?

Ruby has some UTF-8 support today. Support will increase with the
m17n support though.

See last question and answer here:

http://blog.grayproductions.net/articles/the_ruby_vm_...

James Edward Gray II

Phlip

9/15/2007 2:40:00 AM

Zephyr Pellerin wrote:

> I hate to discuss something related to the development timeline, I know
> its tenable, but When will it be reasonable to expect Unicode support from
> Ruby?

"Unicode" is not an encoding. Are you asking for UTF-8, UTF-16, or something
else?

--
Phlip

Todd Burch

9/17/2007 12:44:00 PM

Zephyr Pellerin wrote:
> I hate to discuss something related to the development timeline, I know
> its tenable, but When will it be reasonable to expect Unicode support
> from Ruby?

I was just looking at the source code for 1.8.6 this weekend. The C
syntax that's being used is pre-ANSI-C (which means in 1988, it was
"old" syntax).

Rotsa Ruck.

Todd
--
Posted via http://www.ruby-....

Phlip

9/17/2007 12:53:00 PM

>> I hate to discuss something related to the development timeline, I know
>> its tenable, but When will it be reasonable to expect Unicode support
>> from Ruby?
>
> I was just looking at the source code for 1.8.6 this weekend. The C
> syntax that's being used is pre-ANSI-C (which means in 1988, it was
> "old" syntax).

Apples and oranges. Unicode libraries like iconv use C linkage, so they can
bond with most C implementations regardless of their compliance. (C linkage
is very weak and simplistic.) All Cs can handle 8-bit strings, and can be
programmed to use 16-bit strings, which are the requirements for UTF-8 and
UTF-16.

Like most languages, Ruby's source is in a primitive form of C to maximize
the number of compilers, and hence the number of platforms and hardwares,
that it runs on. I would suspect - unless Matz is an even greater genius
than average - that Ruby's C style has been carefully retrofitted, after the
language passed its first few version ticks.

> Rotsa Ruck.

Racial slur noted.

--
Phlip

Todd Burch

9/17/2007 1:50:00 PM

Phlip wrote:

> Racial slur noted.

You got a problem with Scooby Doo?

For the record, this was NOT intended to slur anything. It was not my
intent, nor is my nature, to slur. However, reading this in hindsight,
it certainly could be taken this way. Please accept my apologies.

Now, I'll rephrase.

Lotsa luck getting something like Unicode implemented when the
underlying C contructs are using such an outdated syntax as ruby's does.

But, as Phlip implies, it's just a simple matter of programming.

Todd
--
Posted via http://www.ruby-....

Phlip

9/17/2007 2:48:00 PM

Todd Burch wrote:

> For the record, this was NOT intended to slur anything. It was not my
> intent, nor is my nature, to slur. However, reading this in hindsight,
> it certainly could be taken this way. Please accept my apologies.

Oh my apologies too - Scooby Doo is quite over my head. All I could
imagine was Matz in a kimono serving Sake.

--
Phlip

Michal Suchanek

9/21/2007 9:20:00 AM

On 15/09/2007, Zephyr Pellerin <ztz@nxvr.org> wrote:
> I hate to discuss something related to the development timeline, I know
> its tenable, but When will it be reasonable to expect Unicode support
> from Ruby?

Ruby has unicode support. Sort of. Regexes work in UTF-8 when $KCODE
is set to "U" (and the default is "N" even in UTF-8 locales, and if
you specify the -K option in the .rb file it overrides the option
specified on the command line, heh).
The non-regex methods do not work but you can convert the string with
str.scan(/./)[0] or str.unpack "U*", and use stuff like each, reverse,
[], ...
You have to remember to convert the string back, though.

Thanks

Michal

Jimmy Kofler

9/22/2007 11:04:00 AM

> Michal Suchanek wrote:
> On 15/09/2007, Zephyr Pellerin <ztz@nxvr.org> wrote:
>> I hate to discuss something related to the development timeline, I know
>> its tenable, but When will it be reasonable to expect Unicode support
>> from Ruby?
>
> Ruby has unicode support. Sort of. Regexes work in UTF-8 when $KCODE
> is set to "U" (and the default is "N" even in UTF-8 locales, and if
> you specify the -K option in the .rb file it overrides the option
> specified on the command line, heh).
> The non-regex methods do not work but you can convert the string with
> str.scan(/./)[0] or str.unpack "U*", and use stuff like each, reverse,
> [], ...
> You have to remember to convert the string back, though.
>
> Thanks
>
> Michal

... or you may use the /re/u regex option to handle UTF-8 encoded
strings (cf. http://snippets.dzone.com/posts... ).

Cheers,

j.k.
--
Posted via http://www.ruby-....

Felipe Contreras

9/28/2007 9:49:00 PM

On 9/21/07, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 15/09/2007, Zephyr Pellerin <ztz@nxvr.org> wrote:
> > I hate to discuss something related to the development timeline, I know
> > its tenable, but When will it be reasonable to expect Unicode support
> > from Ruby?
>
> Ruby has unicode support. Sort of. Regexes work in UTF-8 when $KCODE
> is set to "U" (and the default is "N" even in UTF-8 locales, and if
> you specify the -K option in the .rb file it overrides the option
> specified on the command line, heh).
> The non-regex methods do not work but you can convert the string with
> str.scan(/./)[0] or str.unpack "U*", and use stuff like each, reverse,
> [], ...
> You have to remember to convert the string back, though.

What about UTF-16?

http://blogs.gnome.org/sudaltsov/2007/09/22/r...

--
Felipe Contreras

John Joyce

9/29/2007 1:07:00 AM

On Sep 28, 2007, at 4:49 PM, Felipe Contreras wrote:

> On 9/21/07, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 15/09/2007, Zephyr Pellerin <ztz@nxvr.org> wrote:
>>> I hate to discuss something related to the development timeline,
>>> I know
>>> its tenable, but When will it be reasonable to expect Unicode
>>> support
>>> from Ruby?
>>
>> Ruby has unicode support. Sort of. Regexes work in UTF-8 when $KCODE
>> is set to "U" (and the default is "N" even in UTF-8 locales, and if
>> you specify the -K option in the .rb file it overrides the option
>> specified on the command line, heh).
>> The non-regex methods do not work but you can convert the string with
>> str.scan(/./)[0] or str.unpack "U*", and use stuff like each,
>> reverse,
>> [], ...
>> You have to remember to convert the string back, though.
>
> What about UTF-16?
>
> http://blogs.gnome.org/sudaltsov/2007/09/22/r...
>
> --
> Felipe Contreras
>
Go to unicode.org
There you can read a full explanation (or a brief one) about why you
don't need to worry about UTF-16
UTF-8 is all you need.
Unicode is something everyone needs to read up on at some point.
I have to read up on every now and then because my brain leaks.

comp.lang.ruby

Unicode

Zephyr Pellerin

James Gray

Phlip

Todd Burch

Phlip

Todd Burch

Phlip

Michal Suchanek

Jimmy Kofler

Felipe Contreras

John Joyce

x Login to ForumsZone