[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Encoding issue for special characters on Windows

Nicolas Gaiffe

1/9/2009 9:11:00 AM

Hi,

I am facing an issue with special characters handling inside a Ruby
script running on Windows and am sure some of you could help me on
this.

This script copies files such as "<English_name>.txt" to
"<Other_language_name>.txt". But once translated, the new filename may
have special characters. 'ä' for instance.

Running
puts 'ä'
in a Ruby script gives
'õ'
as an output, whereas the same code in irb gives
'ä'

There must be an encoding issue at some point in my script but I
didn't manage to fix it (tried different values of '#encoding:'
without success). Any clue ?

Many thanks in advance
Best regards

Nicolas
3 Answers

pjb

1/9/2009 10:09:00 AM

0

Nicolas Gaiffe <nicolas.gaiffe@gmail.com> writes:

> Hi,
>
> I am facing an issue with special characters handling inside a Ruby
> script running on Windows and am sure some of you could help me on
> this.
>
> This script copies files such as "<English_name>.txt" to
> "<Other_language_name>.txt". But once translated, the new filename may
> have special characters. 'ä' for instance.
>
> Running
> puts 'ä'
> in a Ruby script gives
> 'õ'
> as an output, whereas the same code in irb gives
> 'ä'
>
> There must be an encoding issue at some point in my script but I
> didn't manage to fix it (tried different values of '#encoding:'
> without success). Any clue ?

I use emacs. In emacs, you'd just put:

#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts "ä"

to have the script encoded in utf-8 and therefore outputing an utf-8 byte stream.
Then of course, you have to have an utf-8 terminal:



[pjb@simias :0.0 tmp]$ chmod 755 test.rb
[pjb@simias :0.0 tmp]$ export LC_CTYPE=en_US.UTF-8
[pjb@simias :0.0 tmp]$ ./test.rb
ä
[pjb@simias :0.0 tmp]$ cat test.rb
#!/usr/bin/ruby
# -*- coding:utf-8 -*-
puts "ä"
[pjb@simias :0.0 tmp]$

Notice that in irb, with an utf-8 terminal, "ä".length == 2


Of course, you can choose to use iso-8859-1 or iso-8859-15, just substitute utf-8.
--
__Pascal Bourguignon__

F. Senault

1/10/2009 3:25:00 PM

0

Le 9 janvier 2009 à 10:10, Nicolas Gaiffe a écrit :

> There must be an encoding issue at some point in my script but I
> didn't manage to fix it (tried different values of '#encoding:'
> without success). Any clue ?

It depends. If you are trying to echo something to the console, you'll
have to use CP850.

The character for ä is 228 in the ISO8859-1 [1] encoding that your file
seems to use, and that corresponds to the õ character in CP850 [2].

Now, if you're writing something on the screen as a means of control or
debug while manipulating files, don't convert your output to CP850 in
your resulting file ! You'd better stay in ISO, or maybe even in UTF-8,
depending on what your real goal is (website, internal application,
database, etc).

Fred
[1] : http://en.wikipedia.org/wiki/ISO/...
[2] : http://en.wikipedia.org/wiki/Cod...
--
I don't need no arms around me I don't need no drugs to calm me
I have seen the writing on the wall Don't think I need anything at all
No, don't think I'll need anything at all
(Pink Floyd, Another Brick in The Wall part 3)

Nicolas Gaiffe

1/13/2009 12:20:00 PM

0

On 10 jan, 16:24, "F. Senault" <f...@lacave.net> wrote:
> It depends.  If you are trying to echo something to the console, you'll
> have to use CP850.
>
> The character for ä is 228 in the ISO8859-1 [1]encodingthat your file
> seems to use, and that corresponds to the õ character in CP850 [2].
>
> Now, if you're writing something on the screen as a means of control or
> debug while manipulating files, don't convert your output to CP850 in
> your resulting file !  You'd better stay in ISO

Hi and sorry for the delay,

You were right. The screen output was the only one concerned by the
issue. The result in the filesystem was allright. So everything is
working as expected since I have no need to display the filenames once
in production.

Thanks to both of you