[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: Iconv and incompatible encodings

Daniel DeLorme

7/10/2006 1:29:00 PM

Paul Battley wrote:
> Yes, there is. Add //IGNORE to the destination encoding to ignore
> unavailable characters, or //TRANSLIT to transliterate them into
> combinations of ASCII characters (e.g. `e for è).
>
> E.g.:
>
> #!/usr/bin/env rubby
> $KCODE = 'u'
> require 'iconv'
>
> s = 'caffè'
>
> ic_ignore = Iconv.new('US-ASCII//IGNORE', 'UTF-8')
> puts ic_ignore.iconv(s) # => caff
>
> ic_translit = Iconv.new('US-ASCII//TRANSLIT', 'UTF-8')
> puts ic_translit.iconv(s) # => caff`e
>
> //TRANSLIT will raise an exception on characters it can't
> transliterate, however; this can be solved by using
> '//IGNORE//TRANSLIT' together (in that order).


Can anyone else get this to work? Instead of "caff`e" I just get "caff?"

Daniel

12 Answers

Paul Battley

7/10/2006 1:54:00 PM

0

On 10/07/06, Daniel DeLorme <dan-ml@dan42.com> wrote:
> Can anyone else get this to work? Instead of "caff`e" I just get "caff?"

What's your platform?

Paul.

Daniel DeLorme

7/10/2006 2:43:00 PM

0

Paul Battley wrote:
> On 10/07/06, Daniel DeLorme <dan-ml@dan42.com> wrote:
>> Can anyone else get this to work? Instead of "caff`e" I just get "caff?"
>
> What's your platform?

ubuntu breezy with ruby 1.8.4
iconv 2.3.5

Davi Barbosa

10/13/2008 5:48:00 PM

0

I have also a problem with iconv. I'm under linux (configured with utf-8
as usual) and under irb I get:
irb(main):016:0> Iconv.conv("US-ASCII//TRANSLIT","UTF-8",'éèêë')
=> "eeee"

But when I try the same in ruby or mod_ruby I get '????', for example:
$ ruby -e "require 'iconv'; puts
Iconv.conv('US-ASCII//TRANSLIT','UTF-8','éèêë')"
????
I already checked with str.each_byte {|x| puts x} and the strings are
exactly the same. Does anyone have any idea why I get two different
answers from Iconv?

My system:
$ irb --version
irb 0.9.5(05/04/13)
$ ruby --version
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]

I have ENV['LANG']==en_US.UTF-8 in both cases.
--
Posted via http://www.ruby-....

James Gray

10/13/2008 5:55:00 PM

0

On Oct 13, 2008, at 12:48 PM, Davi Barbosa wrote:

> I have also a problem with iconv. I'm under linux (configured with =20
> utf-8
> as usual) and under irb I get:
> irb(main):016:0> Iconv.conv("US-ASCII//TRANSLIT","UTF-8",'=E9=E8=EA=EB')=

> =3D> "eeee"
>
> But when I try the same in ruby or mod_ruby I get '????', for example:
> $ ruby -e "require 'iconv'; puts
> Iconv.conv('US-ASCII//TRANSLIT','UTF-8','=E9=E8=EA=EB')"
> ????
> I already checked with str.each_byte {|x| puts x} and the strings are
> exactly the same. Does anyone have any idea why I get two different
> answers from Iconv?
>
> My system:
> $ irb --version
> irb 0.9.5(05/04/13)
> $ ruby --version
> ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
>
> I have ENV['LANG']=3D=3Den_US.UTF-8 in both cases.

Try adding the -KU switch to Ruby, to put it in UTF-8 mode.

James Edward Gray II=

Davi Barbosa

10/13/2008 6:45:00 PM

0

James Gray wrote:
> Try adding the -KU switch to Ruby, to put it in UTF-8 mode.

Thank you for your really fast answer. I tried this:
$ ruby -KU -e "require 'iconv'; puts
Iconv.conv('US-ASCII//TRANSLIT','UTF-8','éèêë')"
????

$ ruby -e "\$KCODE='u'; require 'iconv'; puts
Iconv.conv('US-ASCII//TRANSLIT','UTF-8','éèêë')"
????

and for information, I have also also:
$ echo 'éèêë' | iconv -t ASCII//TRANSLIT -f UTF-8
eeee

$ iconv --version
iconv (GNU libc) 2.7

irb(main):002:0> 'é'.each_byte {|x| puts x}
195
169
=> "\303\251"

$ ruby -e "'é'.each_byte {|x| puts x}"
195
169

and finally, the most weird, irb doesn't work if I use pipe:
$ echo "require 'iconv'; puts
Iconv.conv('US-ASCII//TRANSLIT','UTF-8','é'); 'é'.each_byte{|x| puts x}"
| irb
require 'iconv'; puts Iconv.conv('US-ASCII//TRANSLIT','UTF-8','é');
'é'.each_byte{|x| puts x}
?
195
169
"\303\251"
--
Posted via http://www.ruby-....

Davi Barbosa

11/3/2008 7:04:00 PM

0



Hello,
I was going crazy with this problem. I searched a lot and found some
people with the same problem: Iconv works with irb but not in a ruby
script.
The solution was take another way. For example, Daniel Lucraft
(http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...)
made 3 suggestions. The first one is use Ruby-GNOME2 library and:

require 'gtk2'
ascii = GLib.convert(string, "ASCII//translit", "UTF-8")

This not only worked for me, as the Iconv started to work as expected!
For instance:
require 'iconv'
require 'gtk2'
puts Iconv.conv("ASCII//translit","UTF-8","áàâä")
gives 'aaaa'.

The second solution:
ascii = %x{echo "#{str}" | iconv -f "ISO-8859-1" -t
"US-ASCII//TRANSLIT"}
also worked here.

The problem is that I'm not using a ruby script, I'm making an web page
with mod_ruby. So, %x{} gives an 'Insecure operation' error and "require
'gtk2'" gives:
/var/www/dev/q/test.rbx:12: Cannot open display:
/usr/lib/ruby/1.8/gtk2.rb:12
/lib.rb:31:in `require'

His last suggestion is to write your own wrapper. Of course I've not
tried. Finally, I used the hack:
Unicode.normalize_KD(string).gsub(/[^\x00-\x7F]/n,'')
as described here: http://www.ruby-...t..., and this looks
to work fine to remove accents (but I'm not sure if the result is an
ascii string)
--
Posted via http://www.ruby-....

Nobuyoshi Nakada

11/4/2008 9:48:00 AM

0

Hi,

At Tue, 4 Nov 2008 04:04:23 +0900,
Davi Barbosa wrote in [ruby-talk:319309]:
> require 'gtk2'
> ascii =3D GLib.convert(string, "ASCII//translit", "UTF-8")
>=20
> This not only worked for me, as the Iconv started to work as expected!=20
> For instance:
> require 'iconv'
> require 'gtk2'
> puts Iconv.conv("ASCII//translit","UTF-8","=E1=E0=E2=E4")
> gives 'aaaa'.

GNU libiconv seems to need the locale set.
The issue would be fixed by the following patch.

=0C
Index: configure.in
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- configure.in (revision 20103)
+++ configure.in (working copy)
@@ -559,5 +559,5 @@ AC_CHECK_HEADERS(stdlib.h string.h unist
syscall.h pwd.h grp.h a.out.h utime.h memory.h direct.h sys/resource.h sys/mkdev.h sys/utime.h netinet/in_systm.h float.h ieeefp.h pthread.h - ucontext.h intrinsics.h)
+ ucontext.h intrinsics.h locale.h)
=20
dnl Check additional types.
Index: main.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- main.c (revision 20103)
+++ main.c (working copy)
@@ -12,4 +12,7 @@
=20
#include "ruby.h"
+#ifdef HAVE_LOCALE_H
+#include <locale.h>
+#endif
=20
#ifdef __human68k__
@@ -35,4 +38,7 @@ main(argc, argv)
char **argv;
{
+#ifdef HAVE_LOCALE_H
+ setlocale(LC_CTYPE, "");
+#endif
#ifdef _WIN32
NtInitialize(&argc, &argv);
=0C

--=20
Nobu Nakada

Adam Strzelecki

6/3/2009 3:28:00 PM

0

As this patch is not going to be accepted which is fair:
http://redmine.ruby-lang.org/issues...

Alternative and light solution is to create own extension as below:

- locale.c -------------------------------------------

#include <locale.h>
#include <ruby.h>

#ifndef RSTRING_PTR
#define RSTRING_PTR(str) RSTRING(str)->ptr
#endif

VALUE Locale = Qnil;

VALUE method_setlocale(VALUE self, VALUE category, VALUE locale);

void Init_locale() {
Locale = rb_define_module("Locale");
rb_define_module_function(Locale, "setlocale", method_setlocale, 2);
rb_define_const(Locale, "LC_CTYPE", INT2NUM(0));
rb_define_const(Locale, "LC_NUMERIC", INT2NUM(1));
rb_define_const(Locale, "LC_TIME", INT2NUM(2));
rb_define_const(Locale, "LC_COLLATE", INT2NUM(3));
rb_define_const(Locale, "LC_MONETARY", INT2NUM(4));
rb_define_const(Locale, "LC_MESSAGES", INT2NUM(5));
rb_define_const(Locale, "LC_ALL", INT2NUM(6));
}

VALUE method_setlocale(VALUE self, VALUE category, VALUE locale) {
int c = NUM2INT(category);
char *r;
if(locale == Qnil) {
r = setlocale(c, NULL);
} else {
Check_Type(locale, T_STRING);
r = setlocale(c, RSTRING_PTR(locale));
}
return r == NULL ? Qnil : rb_str_new2(r);
}

- extconf.rb -------------------------------------------

require 'mkmf'
extension_name = 'locale'
dir_config(extension_name)
create_makefile(extension_name)


... and use:
require 'locale'
Locale::setlocale Locale::LC_CTYPE, ''

This is what I do in one of my projects and it works fine with Iconv,
note I use LC_CTYPE not LC_ALL to not affect numbers or dates
formatting.

Regards,
--
Adam Strzelecki | nanoant.com
--
Posted via http://www.ruby-....

Nobuyoshi Nakada

6/6/2009 2:23:00 PM

0

Hi,

At Thu, 4 Jun 2009 00:28:25 +0900,
Adam Strzelecki wrote in [ruby-talk:338275]:
> Alternative and light solution is to create own extension as below:

First of all, option after // is GNU iconv local extension.

Second, extconf.rb must check for the necessary header, and
whether each categories are defined.

# extconf.rb
require 'mkmf'
extension_name = 'locale'
header = "locale.h"
dir_config(extension_name)
if have_header(header)
lc = %w[CTYPE NUMERIC TIME COLLATE MONETARY MESSAGES ALL]
lc = lc.delete_if {|n| !have_macro("LC_#{n}", header)}.
collect {|n| "def(#{n.downcase}, LC_#{n})"}.
join(' ')
$defs << "-Dforeach_categories(def)=\"#{lc}\""
create_header
create_makefile(extension_name)
end

Next, StringValueCStr() is much better than Check_Type().

> ... and use:
> require 'locale'
> Locale::setlocale Locale::LC_CTYPE, ''

And it feels too redundant. I guess Locale.ctype = '' would be
easy.

/* locale.c */
#include <locale.h>
#include "ruby.h"

static VALUE
rb_setlocale(int category, VALUE locale)
{
char *r = setlocale(category, StringValueCStr(locale));
return r ? rb_str_new2(r) : Qnil;
}

static VALUE
rb_getlocale(int category)
{
char *r = setlocale(category, NULL);
return r ? rb_str_new2(r) : Qnil;
}

#define funcs(n, c) static VALUE rb_getlocale_##n(VALUE self) {return rb_getlocale(c);} static VALUE rb_setlocale_##n(VALUE self, VALUE val) {return rb_setlocale(c, val);} /* end of funcs */

foreach_categories(funcs)

void
Init_locale(void)
{
VALUE locale = rb_define_module("Locale");
#define methods(n, c) rb_define_singleton_method(locale, #n, rb_getlocale_##n, 0); rb_define_singleton_method(locale, #n"=", rb_setlocale_##n, 1); /* end of methods */

foreach_categories(methods);
}

--
Nobu Nakada

Adam Strzelecki

6/6/2009 3:25:00 PM

0

Nobuyoshi,

Thanks, your solution is really more Ruby-way. I just wonder why
"setlocale" isn't a part of Ruby standard library. Since Ruby maps/wraps
most of the standard (POSIX) functions (especially those available on
Windows too), this one should be also taken into consideration.

> First of all, option after // is GNU iconv local extension.
Sure I know that, but it doesn't mean it is EVIL, is it? Still it is
very useful for creating permalinks and removing accented characters
simply, w/o using any third party libraries and so, but unusable until
we call POSIX setlocale, which isn't present in Ruby API.

> Second, extconf.rb must check for the necessary header, and
> whether each categories are defined.

Still it should be present on every system (AFAIK it is), since quoting
man: "The setlocale() function conforms to ISO/IEC 9899:1999 (``ISO
C99'').". Is there anyone who checks whether <stdio.h> exists?

> And it feels too redundant. I guess Locale.ctype = '' would be easy.

Sure yours is better. Mine didn't consider fact that some of constants
may have different values on different systems.

If it is was to be included into standard library I'd leave
Locale::setlocale method as well, as you may combine types there and
also check returned value, where nil means failed association and String
on successful one, where documentation doesn't explicitly say that
returned string is exactly the one that was passed. So with simple
Locale::ctype= we may miss some important feedback.

Cheers,
Adam.
--
Posted via http://www.ruby-....