[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Output UTF-16LE BOM to file - 1.9

Chris Morris

4/9/2009 8:31:00 PM

[Note: parts of this message were removed to make it a legal post.]

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]

With this code:

File.open('zz.txt', 'w:UTF-16LE') do |f|
f.print "Hello Uni-world"
end

...I get no BOM

guts = File.read('zz.txt')
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...

...and my brain can't concoct a way to insert it myself, though I know
it must be simple...


--
Chris
http:/...

1 Answer

James Gray

4/12/2009 4:39:00 PM

0

On Apr 9, 2009, at 3:31 PM, Chris Morris wrote:

> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]
>
> With this code:
>
> File.open('zz.txt', 'w:UTF-16LE') do |f|
> f.print "Hello Uni-world"
> end
>
> ...I get no BOM
>
> guts = File.read('zz.txt')
> puts guts.bytes.to_a.inspect
>
> #=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...
>
> ...and my brain can't concoct a way to insert it myself, though I know
> it must be simple...

Yeah, it's easy stuff.

A Unicode BOM is just the character U+FEFF encoded at the beginning of
the document. You can insert that character yourself with Ruby 1.9's
Unicode escape and it will be transcoded into the proper byte order
based on the external_encoding() you are writing to:

$ cat utf16_bom.rb
# encoding: UTF-8
File.open("utf16_bom.txt", "w:UTF-16LE") do |f|
f.puts "\uFEFFThis is UTF-16LE with a BOM."
end
$ ruby -v utf16_bom.rb
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]
$ ruby -e 'p File.binread(ARGV.shift)[0..9]' utf16_bom.txt
"\xFF\xFET\x00h\x00i\x00s\x00"

Hope that helps.

James Edward Gray II