[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

unicode in ruby

Richard Gyger

3/8/2006 11:46:00 AM

i'm using IO.foreach to parse the lines in a file. now i'm trying to get
it to work with unicode encoded files. does ruby support unicode? how do
i compare a variable with a unicode constant string?

the script goes something like:

IO.foreach("myfile.txt") { |line|
if line.downcase[0,2] == "id"


34 Answers

Michal Suchanek

3/8/2006 12:45:00 PM

0

On 3/8/06, Richard Gyger <richard@bytethink.com> wrote:> i'm using IO.foreach to parse the lines in a file. now i'm trying to get> it to work with unicode encoded files. does ruby support unicode? how do> i compare a variable with a unicode constant string?>> the script goes something like:>> IO.foreach("myfile.txt") { |line|> if line.downcase[0,2] == "id"To get unicode downcase you probably want icu4r. To handle the casesyou are interested in you could write your own. However, the []operator of ruby strings returns bytes, not characters.hthMichal-- Support the freedom of music!Maybe it's a weird genre .. but weird is *not* illegal.Maybe next time they will send a special forces commandoto your picnic .. because they think you are weird. www.music-versus-guns.org http://en.police...

pere.noel

3/8/2006 12:50:00 PM

0

Michal Suchanek <hramrach@centrum.cz> wrote:

>
> On 3/8/06, Richard Gyger <richard@bytethink.com> wrote:

> i'm using IO.foreach [.. no \n ]


you don't make use of "\n" at uni-berlin.de when wrapping ?

could be more readable ;-)
--
une bévue

Richard Gyger

3/8/2006 6:13:00 PM

0

so, you guys are telling me a language developed since the year 2000
doesn't support unicode strings natively? in my opinion, that's a pretty
glaring problem.

Une bévue wrote:

>Michal Suchanek <hramrach@centrum.cz> wrote:
>
>
>
>>On 3/8/06, Richard Gyger <richard@bytethink.com> wrote:
>>
>>
>
>
>
>>i'm using IO.foreach [.. no \n ]
>>
>>
>
>
>you don't make use of "\n" at uni-berlin.de when wrapping ?
>
>could be more readable ;-)
>
>


Logan Capaldo

3/8/2006 6:21:00 PM

0


On Mar 8, 2006, at 1:13 PM, Richard Gyger wrote:

> so, you guys are telling me a language developed since the year
> 2000 doesn't support unicode strings natively? in my opinion,
> that's a pretty glaring problem.
>

Ruby doesn't really support any strings natively. It just happens to
have a bytevector class that acts a lot like a string ;) Having said
that, have you tried:
$KCODE="u" # Assumes the source file is encoded as UTF8, effects
literal strings, regexps, etc.

If your source file is UTF16 or some other non-UTF8 encoding you'll
have to use iconv to get into UTF8 to compare with the literals in
your source.



Michal Suchanek

3/8/2006 6:24:00 PM

0

On 3/8/06, Richard Gyger <richard@bytethink.com> wrote:> so, you guys are telling me a language developed since the year 2000> doesn't support unicode strings natively? in my opinion, that's a pretty> glaring problem.For me it is a problem as well. But getting unicode right is hard.Look at the size of the icu library and the size of ruby itself.Anyway, unicode regexps are planned for ruby 2.0 iirc.ThanksMichal-- Support the freedom of music!Maybe it's a weird genre .. but weird is *not* illegal.Maybe next time they will send a special forces commandoto your picnic .. because they think you are weird. www.music-versus-guns.org http://en.police...

Michal Suchanek

3/8/2006 6:31:00 PM

0

On 3/8/06, Logan Capaldo <logancapaldo@gmail.com> wrote:>> On Mar 8, 2006, at 1:13 PM, Richard Gyger wrote:>> > so, you guys are telling me a language developed since the year> > 2000 doesn't support unicode strings natively? in my opinion,> > that's a pretty glaring problem.> >>> Ruby doesn't really support any strings natively. It just happens to> have a bytevector class that acts a lot like a string ;) Having said> that, have you tried:> $KCODE="u" # Assumes the source file is encoded as UTF8, effects> literal strings, regexps, etc.>> If your source file is UTF16 or some other non-UTF8 encoding you'll> have to use iconv to get into UTF8 to compare with the literals in> your source.err, no that is not what people want when they speak about downcase in unicode.Sure, you can write a string encoded in utf-8 in your source, andverify it is byte-identical to another string. That is about all youget this way.I suspect regexps won't work right with multibyte characters, fordowncase or case -insensitive regexps you would even need to know thelanguage.ThanksMichal-- Support the freedom of music!Maybe it's a weird genre .. but weird is *not* illegal.Maybe next time they will send a special forces commandoto your picnic .. because they think you are weird. www.music-versus-guns.org http://en.police...

Daniel Harple

3/8/2006 6:54:00 PM

0


On Mar 8, 2006, at 7:24 PM, Michal Suchanek wrote:

> Anyway, unicode regexps are planned for ruby 2.0 iirc.

Unicode strings are also planned for Ruby 2 (possibly implemented
already?).

-- Daniel


Eric Jacoboni

3/8/2006 7:00:00 PM

0

Logan Capaldo <logancapaldo@gmail.com> writes:

> Ruby doesn't really support any strings natively. It just happens to
> have a bytevector class that acts a lot like a string ;)

.... that acts a lot like a string /of ASCII chars/, actually. Rather
anachronic, imho.

I can't consider that "il était une fois".length == 18 is the way it
should be with a string in a modern language.

Of course, tweaking with -K and jcode and/or other third parties
modules and/or various hacks allow some enhancements (we have a
jlength method that seems working), but that's not the Peru, either
(case methods support only ASCII chars, etc.)

Waiting for a plain support in Rite (much more important to me than
the "end" issues...).

--
Eric Jacoboni, ne il y a 1445284322 secondes

Brad Tilley

3/8/2006 7:18:00 PM

0

Eric Jacoboni wrote:

> Waiting for a plain support in Rite (much more important to me than
> the "end" issues...)

Speaking of Rite... is there a timeline on its release yet? One year?
Two years? More?

Daniel Harple

3/8/2006 7:26:00 PM

0

On Mar 8, 2006, at 8:18 PM, rtilley wrote:

> Speaking of Rite... is there a timeline on its release yet? One
> year? Two years? More?

http://www.atdot...
http://redhanded.hobix.com/cult/yarvMerge...

-- Daniel