[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Unicode in irb on windows (respectively script/console in instantrails

michael.raidel

11/7/2006 9:20:00 PM

Hi everyone!

I have a problem with Unicode in irb on Windows. I recognized it when
trying to save an attribute of an ActiveRecord-Model with an umlaut
(for example "ü") in script/console. If the database connection is
encoded in utf8, everything after the umlaut gets truncated, in the
default encoding I get funny characters back. It doesn't matter if the
$KCODE is set to UTF8 or NONE, the character number stays the same
(also on plain irb)!

Does anyone has a hint on how to solve this? Of course I could try
things such as Cygwin, but I am trying to find an elegant solution for
Windows-Users, which eventually could merge in the next
InstantRails-release, if Curt agrees.

Thanks a lot,

Michael

7 Answers

Austin Ziegler

11/7/2006 11:08:00 PM

0

On 11/7/06, michael.raidel@gmail.com <michael.raidel@gmail.com> wrote:
> I have a problem with Unicode in irb on Windows. I recognized it when
> trying to save an attribute of an ActiveRecord-Model with an umlaut
> (for example "ü") in script/console. If the database connection is
> encoded in utf8, everything after the umlaut gets truncated, in the
> default encoding I get funny characters back. It doesn't matter if the
> $KCODE is set to UTF8 or NONE, the character number stays the same
> (also on plain irb)!

The windows console -- also used by cygwin -- doesn't recognise UTF-8.
(That is, it's not possible to properly display UTF-8 in cmd.exe, at
least so far as I can tell.)

-austin
--
Austin Ziegler * halostatue@gmail.com * http://www.halo...
* austin@halostatue.ca * http://www.halo...feed/
* austin@zieglers.ca

Chilkat Software

11/7/2006 11:27:00 PM

0


A DOS console displays characters according to the OEM code page. Here is
an example showing how to properly display a
string with 8bit chars (e.g. characters
with diacritics, or accent marks)...

# file: oemCodePage.rb

require 'chilkat'

# (The CkString class is freeware)
myStr = Chilkat::CkString.new()

# A DOS console does NOT display this correctly:
print "é ô à ç\n"

# What we need is the OEM (DOS) code page...
# OEM code pages are listed here:
#
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod...
myStr.appendAnsi("é ô à ç\n")

# Emit the string in the character encoding of your choice:
# ibm850 is the OEM code page for Latin1
print myStr.getEnc("ibm850")

# Chilkat supports these:
# us-ascii
# unicode
# unicodefffe
# iso-8859-1
# iso-8859-2
# iso-8859-3
# iso-8859-4
# iso-8859-5
# iso-8859-6
# iso-8859-7
# iso-8859-8
# iso-8859-9
# iso-8859-13
# iso-8859-15
# windows-874
# windows-1250
# windows-1251
# windows-1252
# windows-1253
# windows-1254
# windows-1255
# windows-1256
# windows-1257
# windows-1258
# utf-7
# utf-8
# utf-32
# utf-32be
# shift_jis
# gb2312
# ks_c_5601-1987
# big5
# iso-2022-jp
# iso-2022-kr
# euc-jp
# euc-kr
# macintosh
# x-mac-japanese
# x-mac-chinesetrad
# x-mac-korean
# x-mac-arabic
# x-mac-hebrew
# x-mac-greek
# x-mac-cyrillic
# x-mac-chinesesimp
# x-mac-romanian
# x-mac-ukrainian
# x-mac-thai
# x-mac-ce
# x-mac-icelandic
# x-mac-turkish
# x-mac-croatian
# asmo-708
# dos-720
# dos-862
# ibm037
# ibm437
# ibm500
# ibm737
# ibm775
# ibm850
# ibm852
# ibm855
# ibm857
# ibm00858
# ibm860
# ibm861
# ibm863
# ibm864
# ibm865
# cp866
# ibm869
# ibm870
# cp875
# koi8-r
# koi8-u



At 05:07 PM 11/7/2006, you wrote:

>On 11/7/06, michael.raidel@gmail.com <michael.raidel@gmail.com> wrote:
>>I have a problem with Unicode in irb on Windows. I recognized it when
>>trying to save an attribute of an ActiveRecord-Model with an umlaut
>>(for example "ü") in script/console. If the database connection is
>>encoded in utf8, everything after the umlaut gets truncated, in the
>>default encoding I get funny characters back. It doesn't matter if the
>>$KCODE is set to UTF8 or NONE, the character number stays the same
>>(also on plain irb)!
>
>The windows console -- also used by cygwin -- doesn't recognise UTF-8.
>(That is, it's not possible to properly display UTF-8 in cmd.exe, at
>least so far as I can tell.)
>
>-austin
>--
>Austin Ziegler * halostatue@gmail.com * http://www.halo...
> * austin@halostatue.ca * http://www.halo...feed/
> * austin@zieglers.ca
>
>
>
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.13.31/522 - Release Date: 11/7/2006



Austin Ziegler

11/8/2006 12:05:00 AM

0

On 11/7/06, Austin Ziegler <halostatue@gmail.com> wrote:
> On 11/7/06, michael.raidel@gmail.com <michael.raidel@gmail.com> wrote:
> > I have a problem with Unicode in irb on Windows. I recognized it when
> > trying to save an attribute of an ActiveRecord-Model with an umlaut
> > (for example "ü") in script/console. If the database connection is
> > encoded in utf8, everything after the umlaut gets truncated, in the
> > default encoding I get funny characters back. It doesn't matter if the
> > $KCODE is set to UTF8 or NONE, the character number stays the same
> > (also on plain irb)!
> The windows console -- also used by cygwin -- doesn't recognise UTF-8.
> (That is, it's not possible to properly display UTF-8 in cmd.exe, at
> least so far as I can tell.)

Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:

chcp 65001

There are some caveats, of course:

http://blogs.msdn.com/michkap/archive/2006/03/06/5...

-austin
--
Austin Ziegler * halostatue@gmail.com * http://www.halo...
* austin@halostatue.ca * http://www.halo...feed/
* austin@zieglers.ca

David Vallner

11/8/2006 2:24:00 AM

0

Austin Ziegler wrote:
>
> Ack my bad. I had forgotten: you can specify the UTF-8 codepage
> (CP_UTF8) with:
>
> chcp 65001
>
> There are some caveats, of course:
>
> http://blogs.msdn.com/michkap/archive/2006/03/06/5...
>

Also the good old combo of "mode con codepage select=65001".

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod...
lists pretty much all the numbers you can use. (The pain of navigating
to that on the MSDN website.)

Amusingly enough, none of those are even present anymore on WinXP Pro
x64. For yet more hilarity, the console is by default set to the DOS OEM
codepage of the given locale, instead of the newer ANSI ones that are
ISO extensions, which causes great fun when trying to use software
that's ever so smart and autodetects my locale as my preferred language
(Postgres, assorted GNU stuff being too clever by half) instead of using
the OS language version.

And "there are some caveats" is an understatement, the UTF-8 support in
the console is a sham - I couldn't get a trivial C program using
arbitrary combinations of tchar.h, wchar.h, -DUNICODE, cmd.exe, the
Windows console, a Cygwin and an MSYS rxvt to do something as daunting
as input random characters that aren't shared between Latin1 and Latin2
codepages, store them as multibyte internally, and then write them out
to a text file and to the console successfully without one step
breaking. The fact whole of CMD broke down in tears from changing that
setting is also worth noting - IIRC, had problems doing output
redirection to a file and whatnot (I can't play around with this without
setting up a virtual machine with a 32bit XP). Basically, the Path Less
Annoying is to only use the console for working in your "native"
codepage, and use a non-console tool for everything else.

end # of rant

David Vallner

michael.raidel

11/8/2006 9:15:00 AM

0

> Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:
>
> chcp 65001

Thank you Austin for the nice hint!

The problem is, that as soon as I switch the codepage, irb (and also
script/console) stops working (it doesn't even start anymore, it just
quits immediately without an error-message).

Michael

Austin Ziegler

11/9/2006 4:24:00 AM

0

On 11/8/06, michael.raidel@gmail.com <michael.raidel@gmail.com> wrote:
> > Ack my bad. I had forgotten: you can specify the UTF-8 codepage (CP_UTF8) with:
> >
> > chcp 65001
>
> Thank you Austin for the nice hint!
>
> The problem is, that as soon as I switch the codepage, irb (and also
> script/console) stops working (it doesn't even start anymore, it just
> quits immediately without an error-message).

That's one of the caveats mentioned: batch files no longer work.
I don't know why. However, if you have Ruby installed in C:\Ruby, you can do:

copy C:\Ruby\bin\irb C:\Ruby\bin\irb.rb
irb.rb

Or:

ruby C:\Ruby\bin\irb

And you'll get a working irb.

-austin
--
Austin Ziegler * halostatue@gmail.com * http://www.halo...
* austin@halostatue.ca * http://www.halo...feed/
* austin@zieglers.ca

Del

11/5/2010 1:16:00 AM

0

On Nov 4, 8:27 pm, "Lloyd Olson" <l...@ssbilliards.com> wrote:
> Does have the ready to charge look.  LTG :)
>
> "vivasantana" <rjn...@comcast.net> wrote in message
>
> news:090fd669-0929-4c71-9016-1dc2456ae905@j18g2000yqd.googlegroups.com...
>
> I would have said the middle.
> Dead on eye contact.

Good thing Summers over ! , If it had lasted much longer the one in
the middle would have finished chewing the tires off my riding lawn
mower =)

Thanks for the Laughs , My Wife loved it ! =)

Pin-Del,
cargpb28