[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

read write integer in binary into a file

Vianney Lecroart

10/25/2007 2:36:00 PM

Hello,

I have some big files with lot of "unsigned int" (4 bytes) numbers and I
want to read and write on these files.

Currently, I found this to write:

myfile << [mynum].pack("i")

and to read:

mynum = myfile.read(4).unpack("i").first

I wonder if there's not something faster/simpler to do that without the
need to convert the number into an array into a string to finally
serialize it.

Thank you.
--
Posted via http://www.ruby-....

9 Answers

Park Heesob

10/25/2007 3:04:00 PM

0

Hi,
----- Original Message -----
From: "Vianney Lecroart" <acemtp@gmail.com>
Newsgroups: comp.lang.ruby
To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
Sent: Thursday, October 25, 2007 11:36 PM
Subject: read write integer in binary into a file


> Hello,
>
> I have some big files with lot of "unsigned int" (4 bytes) numbers and I
> want to read and write on these files.
>
> Currently, I found this to write:
>
> myfile << [mynum].pack("i")
>
> and to read:
>
> mynum = myfile.read(4).unpack("i").first
>
> I wonder if there's not something faster/simpler to do that without the
> need to convert the number into an array into a string to finally
> serialize it.
>
> Thank you.

How about Marshal?

myfile << Marshal.dump(mynum)

and

mynum = Marshal.load(myfile.read)

Regards,

Park Heesob

Vianney Lecroart

10/25/2007 3:08:00 PM

0

> How about Marshal?

Files are filled by an external C application that do something like:
fwrite(fp, 4, myint);

Se I have to use the same file format.
--
Posted via http://www.ruby-....

Michael Linfield

10/25/2007 3:17:00 PM

0

Vianney Lecroart wrote:
>> How about Marshal?
>
> Files are filled by an external C application that do something like:
> fwrite(fp, 4, myint);
>
> Se I have to use the same file format.

What file format? I dont see any problem with using Marshal, it doesnt
need a file format specified its simply just a marshal dump.
--
Posted via http://www.ruby-....

Vianney Lecroart

10/25/2007 3:20:00 PM

0

It seems that the marshaling of a number doesn't give a 4 bytes:

irb(main):036:0> mynum
=> 56515
irb(main):037:0> [mynum].pack("i")
=> "\303\334\000\000"
irb(main):038:0> Marshal.dump(mynum)
=> "\004\bi\002\303\334"
--
Posted via http://www.ruby-....

yermej

10/25/2007 3:55:00 PM

0

On Oct 25, 9:36 am, Vianney Lecroart <ace...@gmail.com> wrote:
> Hello,
>
> I have some big files with lot of "unsigned int" (4 bytes) numbers and I
> want to read and write on these files.
>
> Currently, I found this to write:
>
> myfile << [mynum].pack("i")
>
> and to read:
>
> mynum = myfile.read(4).unpack("i").first
>
> I wonder if there's not something faster/simpler to do that without the
> need to convert the number into an array into a string to finally
> serialize it.
>
> Thank you.
> --
> Posted viahttp://www.ruby-....

Do you have to deal with each number individually? Maybe you could
build up an array of numbers and then pack them all at once:

arr = []
while work_to_do do
mynum = generate_next_number
arr << mynum
end
myfile.write arr.pack('i*')

That way you aren't creating a new array for each number.

Similarly, for reading the file:
data = file.read
num_array = data.unpack('i*')

The '*' in (un)pack means to process the rest of the data in the same
way.

Adam Preble

10/25/2007 4:09:00 PM

0

I wrote a function to do this which seems slightly faster, but could
perhaps stand some optimization:

def pack_int32(n)
str = ' '
str[3] = n >> 24
str[2] = n >> 16
str[1] = n >> 8
str[0] = n
str
end

Here are the benchmark results vs the other methods mentioned:

user system total real
[].pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)

I included Marshal.dump for completeness, but agree that it doesn't
appear to be meant for this sort of thing. Here's the source to run
the benchmark:

require 'benchmark'
number = 2_000_000
n = 1_000_000
Benchmark.bm(12) do |x|
x.report('[].pack(i):') { n.times do; [number].pack('i'); end }
x.report('pack_int32:') { n.times do; pack_int32(number); end }
x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
end

Adam

On Oct 25, 10:36 am, Vianney Lecroart <ace...@gmail.com> wrote:
> Hello,
>
> I have some big files with lot of "unsigned int" (4 bytes) numbers and I
> want to read and write on these files.
>
> Currently, I found this to write:
>
> myfile << [mynum].pack("i")
>
> and to read:
>
> mynum = myfile.read(4).unpack("i").first
>
> I wonder if there's not something faster/simpler to do that without the
> need to convert the number into an array into a string to finally
> serialize it.
>
> Thank you.


Phrogz

10/25/2007 4:32:00 PM

0

On Oct 25, 10:09 am, Adam Preble <pre...@gmail.com> wrote:
> I wrote a function to do this which seems slightly faster, but could
> perhaps stand some optimization:
>
> def pack_int32(n)
> str = ' '
> str[3] = n >> 24
> str[2] = n >> 16
> str[1] = n >> 8
> str[0] = n
> str
> end
>
> Here are the benchmark results vs the other methods mentioned:
>
> user system total real
> [].pack(i): 6.234000 0.235000 6.469000 ( 6.500000)
> pack_int32: 5.719000 0.015000 5.734000 ( 5.734000)
> Marshal.dump: 6.594000 0.219000 6.813000 ( 6.813000)
>
> I included Marshal.dump for completeness, but agree that it doesn't
> appear to be meant for this sort of thing. Here's the source to run
> the benchmark:
>
> require 'benchmark'
> number = 2_000_000
> n = 1_000_000
> Benchmark.bm(12) do |x|
> x.report('[].pack(i):') { n.times do; [number].pack('i'); end }
> x.report('pack_int32:') { n.times do; pack_int32(number); end }
> x.report('Marshal.dump:') { n.times do; Marshal.dump(number); end }
> end

Using only the number 2_000_000 seems to skew the results. I see your
results with your test, but if I change it slightly to use a variety
of integers, I get more balanced results:

require 'benchmark'
MAX = 2**30
n = 1_000_000
nums = (0..n).map{ (rand*MAX).to_i }

Benchmark.bmbm do |x|
x.report('pack(i):') { nums.each{ |num| [num].pack('i') } }
x.report('pack32:') { nums.each{ |num| pack_int32(num) } }
x.report('Dump:') { nums.each{ |num| Marshal.dump(num) } }
end

Rehearsal --------------------------------------------
pack(i): 5.813000 0.109000 5.922000 ( 5.984000)
pack32: 5.234000 0.000000 5.234000 ( 5.281000)
Dump: 5.906000 0.125000 6.031000 ( 6.063000)
---------------------------------- total: 17.187000sec

user system total real
pack(i): 5.687000 0.125000 5.812000 ( 5.875000)
pack32: 5.141000 0.016000 5.157000 ( 5.188000)
Dump: 6.000000 0.078000 6.078000 ( 6.141000)

Wu Junchen

12/13/2007 1:27:00 PM

0

Vianney Lecroart wrote:
> Hello,
>
> I have some big files with lot of "unsigned int" (4 bytes) numbers and I
> want to read and write on these files.
>
> Currently, I found this to write:
>
> myfile << [mynum].pack("i")
>
> and to read:
>
> mynum = myfile.read(4).unpack("i").first
>
> I wonder if there's not something faster/simpler to do that without the
> need to convert the number into an array into a string to finally
> serialize it.
>
> Thank you.


irb(main):001:0> f=open('test','w')
=> #<File:test>
irb(main):002:0> f<<[65535].pack('i')
=> #<File:test>
irb(main):003:0> f.tell
=> 4
irb(main):004:0> f<<[720850].pack('i')
=> #<File:test>
irb(main):005:0> f.tell
=> 9
the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
only!How can I fix this?Thanks!
--
Posted via http://www.ruby-....

Tim Hunter

12/13/2007 1:42:00 PM

0

Wu Junchen wrote:
> irb(main):001:0> f=open('test','w')
> => #<File:test>
> irb(main):002:0> f<<[65535].pack('i')
> => #<File:test>
> irb(main):003:0> f.tell
> => 4
> irb(main):004:0> f<<[720850].pack('i')
> => #<File:test>
> irb(main):005:0> f.tell
> => 9
> the integer 720850 takes 5 bytes in my file,but it should take 4 bytes
> only!How can I fix this?Thanks!

>irb
irb(main):001:0> x = [720850].pack('i')
=> "\322\377\n\000"
irb(main):002:0> x.length
=> 4

So clearly the integer 720850 is packed into 4 bytes as requested. Why
does it occupy 5 bytes in the file? But see the "\n" in position 2? That
means that the 3rd byte is a newline character, and on Windows, in text
files, Ruby turns newlines into CRLF. 2 bytes! Since you've got binary
data in your file you don't want to write a text file, so you must open
the file with the "b" flag in addition to "w":

f = open("test", "wb")

--
Posted via http://www.ruby-....