[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Unicode roadmap?

Ivan Mashchenko

5/31/2007 8:30:00 PM

Hello, everyone. I am sorry, I was a bit embarassed by the quantity of
text in this discussion and I may have read it not enough carefully to
firure out the answer, and it (discussion) itself seems to be a year
old, so I've decided to ask:

Finally, is there a convenient support for Unicode in Ruby? Or, if not,
when will it be?

I am going to develop an international website (with pages in some
european languages, including those using non-latin alphabets). I think
it should prove to be a good idea to make such a website totally in
Unicode (probably UTF-16), without using any legacy encodings at all.
The DBMS I am going to use is Oracle 10g (Express edition until it comes
to its limitations).

As well I would like to ask when the next Ruby release is planned to. If
it comes this year, I should probably try nightly builds as it seems to
be wise to start a new project targeting ea version of the next release.

Thanks in advance.

--
Posted via http://www.ruby-....

7 Answers

Austin Ziegler

5/31/2007 10:31:00 PM

0

On 5/31/07, Ivan Mashchenko <ivan.mashchenko@gmail.com> wrote:
> Hello, everyone. I am sorry, I was a bit embarassed by the quantity of
> text in this discussion and I may have read it not enough carefully to
> firure out the answer, and it (discussion) itself seems to be a year
> old, so I've decided to ask:

> Finally, is there a convenient support for Unicode in Ruby? Or, if not,
> when will it be?

There are a lot of answers to that question, and I strongly suggest
you search as this is a hotly debated discussion.

Google is more useful for searching this than ruby-forum.com. You will
find out when there will be a new release, and the current state of
Unicode.

-austin
--
Austin Ziegler * halostatue@gmail.com * http://www.halo...
* austin@halostatue.ca * http://www.halo...feed/
* austin@zieglers.ca

_why

6/1/2007 6:15:00 AM

0

On Fri, Jun 01, 2007 at 05:29:31AM +0900, Ivan Mashchenko wrote:
> Finally, is there a convenient support for Unicode in Ruby? Or, if not,
> when will it be?

Well, Ruby 1.9 (which is due in December) will have some Unicode
support. (So you'll have a `chars` method on strings, like with
Rails.) Matz is working on it right now even, as he posted that he
was tooling around with string.c earlier this week on his blog.

That is, nothing's been checked in yet. Because he wants it to be
good, you see?

_why

Erik Hollensbe

6/1/2007 6:24:00 AM

0

On 2007-05-31 15:30:50 -0700, "Austin Ziegler" <halostatue@gmail.com> said:

> On 5/31/07, Ivan Mashchenko <ivan.mashchenko@gmail.com> wrote:
>> Hello, everyone. I am sorry, I was a bit embarassed by the quantity of
>> text in this discussion and I may have read it not enough carefully to
>> firure out the answer, and it (discussion) itself seems to be a year
>> old, so I've decided to ask:
>
>> Finally, is there a convenient support for Unicode in Ruby? Or, if not,
>> when will it be?
>
> There are a lot of answers to that question, and I strongly suggest
> you search as this is a hotly debated discussion.
>
> Google is more useful for searching this than ruby-forum.com. You will
> find out when there will be a new release, and the current state of
> Unicode.

If it helps any, I've moved ~2000 web pages in an internal work project
that had mixed UTF-8/cp-1252 (in the content, not just between pages)
and ruby handled it very gracefully. I was using 1.8.5-p12 and Hpricot
(but not Hpricot's encoding features, which last I checked are broken)
for the process.

While I'm certainly not an authority on the subject, I've thoroughly
battle-tested this and it works with a high degree of confidence.
Certainly better than perl and libxml2, which was our original
implementation.

Richard Conroy

6/1/2007 9:51:00 AM

0

On 5/31/07, Ivan Mashchenko <ivan.mashchenko@gmail.com> wrote:
> Finally, is there a convenient support for Unicode in Ruby? Or, if not,
> when will it be?

It depends on your definition of 'convenient'.

The short answer is that unicode applications can be made in Ruby,
particularly Web Apps. It is not especially difficult, but it is not
'for free' or seamless. You generally have to use an encoding-aware
string type, or modify the existing string class to support multi-byte
characters.

A longer answer would contain references to the fact that there are
multiple options here, that web apps (Rails in particular) are ahead
of pure Ruby in terms of Unicode, and that there are actually a lot
of projects to investigate.

The hardest part of Ruby and Unicode is that not all of the libraries
support it, or that some of the meta-hackery to the string class
could break libraries that expect chars.length to equal bytes.length
(there are other examples). Some of the more popular libraries are
like this, or they inherit the encoding from your O/S settings and
cannot be driven from an API.

> I am going to develop an international website (with pages in some
> european languages, including those using non-latin alphabets). I think
> it should prove to be a good idea to make such a website totally in
> Unicode (probably UTF-16), without using any legacy encodings at all.

Well yes, but I would use UTF-8 instead. Its Unicode designed for the
web (and UTF-16 is a bit wierd in some ways - there are at least 3 kinds
of UTF-16 that I am aware of).

Rails 1.2 introduced some pretty impressive support for Unicode in the
last release, all of the major i18n plugins should be compatible with
these changes by now.

> As well I would like to ask when the next Ruby release is planned to. If
> it comes this year, I should probably try nightly builds as it seems to
> be wise to start a new project targeting ea version of the next release.

AFAIK there is no release schedule. YARV is basically Ruby 1.9, and it
is scheduled for release around the end of the year. However there is no
firm commitment to make it the next Ruby version. Also Ruby 1.9 is going
to break/deprecate stuff - I wouldn't develop against it, it will be a
rough experience.
Ruby 1.9 is kind of a staging release; migrating from 1.8 -> 1.9 is going
to be tricky, but 1.9 -> 2.0 should be a drop in; that the intention - isolate
the biggest changes to the 1.9 release.

If you are moving to Ruby 1.9, do it with a complete working application.
Or better still, develop against Rails versions, not Ruby versions. Let the
Rails team figure out the best Ruby migration strategy for you.

Ivan Mashchenko

6/1/2007 11:29:00 AM

0

Richard Conroy wrote:

> It depends on your definition of 'convenient'.

IMHO convinient is as in C#. There I don't have to bother how are
strings stored in memroy, they just do work and are international.

> Well yes, but I would use UTF-8 instead.

Won't there be a problem if the data is stored in UTF-16 (as far as I
know Orace, NVARCHAR uses 16-bit per symbol)

> Also Ruby 1.9 is going to break/deprecate stuff - I wouldn't develop against it
> migrating from 1.8 -> 1.9 is going to be tricky

So why should anyone develop a new project against 1.8 if it is going to
be deprecated?

> If you are moving to Ruby 1.9, do it with a complete working
> application.

But isn't it going to be tricky, as you've said?

I dont have to be moving for now as I have no line of Ruby code (I have
only an idea in my head) for today. And no Ruby experience (I am C++,
C#, Java and T-SQL developer). I've chosen Ruby as it seems almost good
and free.

Have I understood you correctly - you think I should make it Ruby 1.8
and then do a tricky move when it comes?

> Or better still, develop against Rails versions, not Ruby versions.

This advice can prove useful. I'll think about it.


--
Posted via http://www.ruby-....

Richard Conroy

6/1/2007 2:24:00 PM

0

On 6/1/07, Ivan Mashchenko <ivan.mashchenko@gmail.com> wrote:
> Richard Conroy wrote:
>
> > It depends on your definition of 'convenient'.
>
> IMHO convinient is as in C#. There I don't have to bother how are
> strings stored in memroy, they just do work and are international.

It's not *that* convenient. By default Ruby strings are 8-byte. You can make
them Unicode strings very easily through a library (kCODE IIRC), and they
will behave as unicode in a way that you don't have to think about. You don't
have to use a different string type.

The problem occurs when you use code that you didn't write that expects
strings to be single-byte. So every time you evaluate a Ruby library, Rails
plugin or gem, you have to do more homework than you would in the
unicode centric Java or C#.

> > Well yes, but I would use UTF-8 instead.
>
> Won't there be a problem if the data is stored in UTF-16 (as far as I
> know Orace, NVARCHAR uses 16-bit per symbol)

Every database worth using lets you specify the encoding of your string
and character types. Check your manuals or the Oracle forums. Anything
that is any way associated with web development supports UTF-8.

>
> > Also Ruby 1.9 is going to break/deprecate stuff - I wouldn't develop against it
> > migrating from 1.8 -> 1.9 is going to be tricky
>
> So why should anyone develop a new project against 1.8 if it is going to
> be deprecated?

Okay, you misunderstood me. There is a feature roadmap towards Ruby 2.0,
where major changes are coming in; the two biggest that I recall are Unicode
support and native/pre-emptive threads. The only reasonable way to implement
them are by altering the behaviour of core classes and the standard library.

This will mean that Ruby code of any sophistication written for Ruby
1.8, including
many libraries is likely to break.

Ruby 1.8 is not going away. Ruby is an open language, with a public source
repository. Unlike with .Net say, where Microsoft distribute the runtime in
binary only-form and can make older versions difficult to get. You have no
obligation to migrate to the most recent version, and there is no technical
reason that multiple runtimes (application specific) cannot co-exist on the
same machine.

Chasing the latest release is really something that you only do with commercial
languages. It's not something that is generally done with open languages.

>
> > If you are moving to Ruby 1.9, do it with a complete working
> > application.
>
> But isn't it going to be tricky, as you've said?

It would be one hell of a lot easier than developing against a moving
target, not knowing if the issues in your code are your issues or
due to the latest release candidate.

Bleeding edge software development is for people who can spare a
lot of blood loss;

> I dont have to be moving for now as I have no line of Ruby code (I have
> only an idea in my head) for today. And no Ruby experience (I am C++,
> C#, Java and T-SQL developer). I've chosen Ruby as it seems almost good
> and free.

Yeah, its a great language. Make a point of checking out the JRuby project.
Its an exceptionally well developed Ruby runtime; it is considerably more
than an interpreter or language bridge - the JRuby guys have basically
doubled the size of the Java platform (or Ruby platform depending on POV).
Ruby is strong where Java is weak, and vice versa.

> Have I understood you correctly - you think I should make it Ruby 1.8
> and then do a tricky move when it comes?

Use Rails, where the most compelling features in Ruby 1.9/2.0 are already
present: Unicode, native concurrency (via processes) and good performance
(via all those <foo>caching mechanisms). When the Rails guys go Ruby 1.9
you can.

> > Or better still, develop against Rails versions, not Ruby versions.
>
> This advice can prove useful. I'll think about it.

regards,
Richard.

John Joyce

6/1/2007 10:59:00 PM

0


On Jun 1, 2007, at 9:23 AM, Richard Conroy wrote:

> On 6/1/07, Ivan Mashchenko <ivan.mashchenko@gmail.com> wrote:
>> Richard Conroy wrote:
>>
>> > It depends on your definition of 'convenient'.
>>
>> IMHO convinient is as in C#. There I don't have to bother how are
>> strings stored in memroy, they just do work and are international.
>
> It's not *that* convenient. By default Ruby strings are 8-byte. You
> can make
> them Unicode strings very easily through a library (kCODE IIRC),
> and they
> will behave as unicode in a way that you don't have to think about.
> You don't
> have to use a different string type.
>
> The problem occurs when you use code that you didn't write that
> expects
> strings to be single-byte. So every time you evaluate a Ruby
> library, Rails
> plugin or gem, you have to do more homework than you would in the
> unicode centric Java or C#.
>
>> > Well yes, but I would use UTF-8 instead.
>>
>> Won't there be a problem if the data is stored in UTF-16 (as far as I
>> know Orace, NVARCHAR uses 16-bit per symbol)
>
> Every database worth using lets you specify the encoding of your
> string
> and character types. Check your manuals or the Oracle forums. Anything
> that is any way associated with web development supports UTF-8.
>
>>
>> > Also Ruby 1.9 is going to break/deprecate stuff - I wouldn't
>> develop against it
>> > migrating from 1.8 -> 1.9 is going to be tricky
>>
>> So why should anyone develop a new project against 1.8 if it is
>> going to
>> be deprecated?
>
> Okay, you misunderstood me. There is a feature roadmap towards Ruby
> 2.0,
> where major changes are coming in; the two biggest that I recall
> are Unicode
> support and native/pre-emptive threads. The only reasonable way to
> implement
> them are by altering the behaviour of core classes and the standard
> library.
>
> This will mean that Ruby code of any sophistication written for Ruby
> 1.8, including
> many libraries is likely to break.
>
> Ruby 1.8 is not going away. Ruby is an open language, with a public
> source
> repository. Unlike with .Net say, where Microsoft distribute the
> runtime in
> binary only-form and can make older versions difficult to get. You
> have no
> obligation to migrate to the most recent version, and there is no
> technical
> reason that multiple runtimes (application specific) cannot co-
> exist on the
> same machine.
>
> Chasing the latest release is really something that you only do
> with commercial
> languages. It's not something that is generally done with open
> languages.
>
>>
>> > If you are moving to Ruby 1.9, do it with a complete working
>> > application.
>>
>> But isn't it going to be tricky, as you've said?
>
> It would be one hell of a lot easier than developing against a moving
> target, not knowing if the issues in your code are your issues or
> due to the latest release candidate.
>
> Bleeding edge software development is for people who can spare a
> lot of blood loss;
>
>> I dont have to be moving for now as I have no line of Ruby code (I
>> have
>> only an idea in my head) for today. And no Ruby experience (I am C++,
>> C#, Java and T-SQL developer). I've chosen Ruby as it seems almost
>> good
>> and free.
>
> Yeah, its a great language. Make a point of checking out the JRuby
> project.
> Its an exceptionally well developed Ruby runtime; it is
> considerably more
> than an interpreter or language bridge - the JRuby guys have basically
> doubled the size of the Java platform (or Ruby platform depending
> on POV).
> Ruby is strong where Java is weak, and vice versa.
>
>> Have I understood you correctly - you think I should make it Ruby 1.8
>> and then do a tricky move when it comes?
>
> Use Rails, where the most compelling features in Ruby 1.9/2.0 are
> already
> present: Unicode, native concurrency (via processes) and good
> performance
> (via all those <foo>caching mechanisms). When the Rails guys go
> Ruby 1.9
> you can.
>
>> > Or better still, develop against Rails versions, not Ruby versions.
>>
>> This advice can prove useful. I'll think about it.
>
> regards,
> Richard.
>
Objective-C (through the Cocoa framework) also handles Unicode
superbly. Problem is, it is not cross-platform and is in fact
strictly OS X stuff, but you could indeed use those libraries
(NSString, etc...) through RubyCocoa, but of course that is far from
convenient or optimal for most purposes.

Ideally, if major OS vendors got behind Ruby full force and put their
Unicode know-how into the codebase, things would be smoother. They're
the ones who really have already figured out pretty good ways to
handle that stuff, and all the major scripting languages could
benefit from it.