[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

[ANN] ICU4R 0.1.0 - initial release

Lugovoi Nikolai

1/19/2006 8:14:00 AM

==ICU4R v.0.1.0 - initial release ==

= Abstract

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.

Project Site: http://rubyforge.org/proje...

Download: http://rubyforge.org/frs/download.php/8116/icu4r-0....

RDoc: http://icu4r.ruby...

= Install Notes

To build ICU4R you'll need GCC and ICU v3.4 libraries, which can be
downloaded from
http://ibm.com/software/globalization/icu/dow...

Build and install:
ruby extconf.rb && make && make check && make install

= Features

ICU4R is Ruby C-extension binding for ICU library.
It is NOT mirroring full ICU object hierarchy, but is rather set of simple
interfaces for some practically useful functionality, and provides:

- UString : String-like class with internal UTF16 storage;
- UCA rules for UString comparisons (<=>, casecmp);
- Unicode regular expressions;
- encoding(codepage) conversion;
- Unicode normalization;
- access to resource bundles, including ICU locale data;
- transliteration, also rule-based;

Bunch of locale-sensitive functions:
- upcase/downcase;
- string collation;
- string search;
- iterators over text line/word/char/sentence breaks;
- message formatting (number/currency/string/time);
- date and number parsing.

== DISCLAIMER ==

The code is slow and inefficient yet, can have many security and memory leaks,
bugs, inconsistent documentation, incomplete test suite. Use it at
your own risk.

Critics, bug reports, feature requests are welcome :)

WBR, Nikolai Lugovoi <meadow.nnick@gmail.com>


6 Answers

Alex Fenton

1/19/2006 4:34:00 PM

0

Lugovoi Nikolai wrote:
> ==ICU4R v.0.1.0 - initial release ==
>
> ICU4R is an attempt to provide better Unicode support for Ruby, based
> on ICU library.

Thanks, this is really interesting - not heard of the ICU library before.

There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.

But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:

SCIPIUS:~/installers/ruby/icu4r alex$ make
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
ustring.c: In function `icu_ustr_new_set':
ustring.c:169: warning: assignment discards qualifiers from pointer target type
ustring.c: In function `icu_reg_get_replacement':
ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c: In function `icu_ustr_substr':
ustring.c:2296: warning: unused variable `n'
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
ld: multiple definitions of symbol _rb_cUString
ustring.o definition of _rb_cUString in section (__DATA,__common)
fmt.o definition of _rb_cUString in section (__DATA,__common)
make: *** [ustring.bundle] Error 1

SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)

HTH
alex

Gyoung-Yoon Noh

1/20/2006 12:45:00 AM

0

On 1/19/06, Lugovoi Nikolai <meadow.nnick@gmail.com> wrote:
> ==ICU4R v.0.1.0 - initial release ==
>
> = Abstract
>
> ICU4R is an attempt to provide better Unicode support for Ruby, based
> on ICU library.
>
> Project Site: http://rubyforge.org/proje...
>
> Download: http://rubyforge.org/frs/download.php/8116/icu4r-0....
>
> RDoc: http://icu4r.ruby...
>
> = Install Notes
>
> To build ICU4R you'll need GCC and ICU v3.4 libraries, which can be
> downloaded from
> http://ibm.com/software/globalization/icu/dow...
>
> Build and install:
> ruby extconf.rb && make && make check && make install
>
> = Features
>
> ICU4R is Ruby C-extension binding for ICU library.
> It is NOT mirroring full ICU object hierarchy, but is rather set of simple
> interfaces for some practically useful functionality, and provides:
>
> - UString : String-like class with internal UTF16 storage;
> - UCA rules for UString comparisons (<=>, casecmp);
> - Unicode regular expressions;
> - encoding(codepage) conversion;
> - Unicode normalization;
> - access to resource bundles, including ICU locale data;
> - transliteration, also rule-based;
>
> Bunch of locale-sensitive functions:
> - upcase/downcase;
> - string collation;
> - string search;
> - iterators over text line/word/char/sentence breaks;
> - message formatting (number/currency/string/time);
> - date and number parsing.
>
> == DISCLAIMER ==
>
> The code is slow and inefficient yet, can have many security and memory leaks,
> bugs, inconsistent documentation, incomplete test suite. Use it at
> your own risk.
>
> Critics, bug reports, feature requests are welcome :)
>
> WBR, Nikolai Lugovoi <meadow.nnick@gmail.com>
>
>

Great work. I'll check out next week.

--
http://nohmad.su...


Lugovoi Nikolai

1/21/2006 9:54:00 AM

0

Alex, thank you for pointing this bug.
I had no compile problems with GCC 3.4.2, GCC 4.0 and MSVC++ 7.1, so
didn't catch that, looks like GCC 3.3 has different default linking
options.

Could you try 0.1.1 release ?
http://rubyforge.org/frs/download.php/8168/icu4r-0....

(Sorry for late response)

Alex Fenton wrote:
> Lugovoi Nikolai wrote:
> > ==ICU4R v.0.1.0 - initial release ==
> >
> > ICU4R is an attempt to provide better Unicode support for Ruby, based
> > on ICU library.
>
> Thanks, this is really interesting - not heard of the ICU library before.
>
> There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.
>
> But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:
>
> SCIPIUS:~/installers/ruby/icu4r alex$ make
> gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
> ustring.c: In function `icu_ustr_new_set':
> ustring.c:169: warning: assignment discards qualifiers from pointer target type
> ustring.c: In function `icu_reg_get_replacement':
> ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
> ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
> ustring.c: In function `icu_ustr_substr':
> ustring.c:2296: warning: unused variable `n'
> g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
> cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
> ld: multiple definitions of symbol _rb_cUString
> ustring.o definition of _rb_cUString in section (__DATA,__common)
> fmt.o definition of _rb_cUString in section (__DATA,__common)
> make: *** [ustring.bundle] Error 1
>
> SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
> Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
> Thread model: posix
> gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
>
> HTH
> alex

Alex Fenton

1/23/2006 7:19:00 PM

0

Lugovoi Nikolai wrote:

> Could you try 0.1.1 release ?
> http://rubyforge.org/frs/download.php/8168/icu4r-0....

thanks for this, it compiles fine on OS X 10.3 (see below), but segfaults when I run the ruby test with

dyld: ruby Undefined symbols:
___gxx_personality_v0
Trace/BPT trap

Let's take it off-list unless this rings any bells for anyone

alex

SCIPIUS:~/icu4r alex$ make clean; make; ruby test/test_ustring.rb
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc


> (Sorry for late response)
>
> Alex Fenton wrote:
>> Lugovoi Nikolai wrote:
>>> ==ICU4R v.0.1.0 - initial release ==
>>>
>>> ICU4R is an attempt to provide better Unicode support for Ruby, based
>>> on ICU library.
>> Thanks, this is really interesting - not heard of the ICU library before.
>>
>> There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.
>>
>> But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:
>>
>> SCIPIUS:~/installers/ruby/icu4r alex$ make
>> gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
>> ustring.c: In function `icu_ustr_new_set':
>> ustring.c:169: warning: assignment discards qualifiers from pointer target type
>> ustring.c: In function `icu_reg_get_replacement':
>> ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
>> ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
>> ustring.c: In function `icu_ustr_substr':
>> ustring.c:2296: warning: unused variable `n'
>> g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
>> cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
>> ld: multiple definitions of symbol _rb_cUString
>> ustring.o definition of _rb_cUString in section (__DATA,__common)
>> fmt.o definition of _rb_cUString in section (__DATA,__common)
>> make: *** [ustring.bundle] Error 1
>>
>> SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
>> Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
>> Thread model: posix
>> gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)
>>
>> HTH
>> alex
>

Michal Suchanek

1/24/2006 5:09:00 PM

0

On 1/24/06, Alex Fenton <alex@deleteme.pressure.to> wrote:> Lugovoi Nikolai wrote:>> > Could you try 0.1.1 release ?> > http://rubyforge.org/frs/download.php/8168/icu4r-0.1.1.tar.... thanks for this, it compiles fine on OS X 10.3 (see below), but segfaults when I run the ruby test with>> dyld: ruby Undefined symbols:> ___gxx_personality_v0> Trace/BPT trap>Usually C++ code compiled by different versions of gcc linked together.Check that all the stuff and the libraries it links with are compiledwith the same gcc.ThanksMichal

Michal Suchanek

1/24/2006 6:07:00 PM

0

On 1/19/06, Lugovoi Nikolai <meadow.nnick@gmail.com> wrote:> ==ICU4R v.0.1.0 - initial release ==>> = Abstract>> ICU4R is an attempt to provide better Unicode support for Ruby, based> on ICU library.What are we missing in Ruby now?> = Features>> ICU4R is Ruby C-extension binding for ICU library.> It is NOT mirroring full ICU object hierarchy, but is rather set of simple> interfaces for some practically useful functionality, and provides:>> - UString : String-like class with internal UTF16 storage;What is that cool about UTF16? You still can get multiword charactresbut the encoding is no longer byte-order independent.> - UCA rules for UString comparisons (<=>, casecmp);> - Unicode regular expressions;I guess we do not have this in Ruby yet but I never tried :)> - encoding(codepage) conversion;I thought this is there somewhere - some iconv thingy or something.> - Unicode normalization;> - access to resource bundles, including ICU locale data;> - transliteration, also rule-based;Wow, does this mean I could read Russian in Latin characters? That wayI could probably understand about half of it :)>> Bunch of locale-sensitive functions:> - upcase/downcase;> - string collation;> - string search;> - iterators over text line/word/char/sentence breaks;> - message formatting (number/currency/string/time);> - date and number parsing.I suspect there are poeple who use this - I only recently switched toen_US.UTF-8 locale from C so that I can read some funny charaters andstill enjoy interfaces not clobbered by translation :)It looks like some features can be useful. I should try it when I getto something that needs some of those funny characters.ThanksMichal-- Support the freedom of music!Maybe it's a weird genre .. but weird is *not* illegal.Maybe next time they will send a special forces commandoto your picnic .. because they think you are weird. www.music-versus-guns.org http://en.police...