[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

How can I parse binary files?

Fabio Vitale

7/17/2006 10:55:00 AM

I've the need to parse a binary file with the following structure:
How can I accomplish this in Ruby?

Header (36 bytes):
- Version (4 byte unsigned integer) currently 1
- UIDValidity (4 byte unsigned integer)
- UIDNext (4 byte unsigned integer)
- Last Write Counter (4 byte unsigned integer)
- the rest unused

Message data (36 bytes per message):
- Filename (23 bytes including terminating NUL character)
- Flags (1 byte bitmask)
- UID (4 byte unsigned integer)
- Message size (4 byte unsigned integer)
- Date (4 byte time_t value)

Flags mask is 1:Recent, 2:Draft, 4:Deleted, 8:Flagged, 16:Answered,
32:Seen.

--
Posted via http://www.ruby-....

23 Answers

Farrel Lifson

7/17/2006 11:06:00 AM

0

On 17/07/06, Fabio Vitale <fabio@sferaconsulting.it> wrote:
> I've the need to parse a binary file with the following structure:
> How can I accomplish this in Ruby?

String#unpack.

Daniel Martin

7/17/2006 1:13:00 PM

0

Fabio Vitale <fabio@sferaconsulting.it> writes:

> I've the need to parse a binary file with the following structure:
> How can I accomplish this in Ruby?

In addition to parsing this yourself using ruby's String#unpack
method, you should also look at the BitStruct extension available at
http://redshift.sourceforge.net/b...

(And found via http://raa.ruby... by doing a search on
"binary")

Am I the only one who thinks that ruby-forum.com should include in a
prominent place pointers to standard ruby documentation, and to the
Ruby Application Archive? I don't object to people posting to the
list via the web form at ruby-forum.com, but I think that a prominent
display of common sources of information would help everyone.

Fabio Vitale

7/17/2006 3:46:00 PM

0

Daniel Martin wrote:
> Fabio Vitale <fabio@sferaconsulting.it> writes:
>
>> I've the need to parse a binary file with the following structure:
>> How can I accomplish this in Ruby?
>
> In addition to parsing this yourself using ruby's String#unpack
> method, you should also look at the BitStruct extension available at
> http://redshift.sourceforge.net/b...

I've found bit-struct very intresting, anyway I cannot figure how to
load a binary file in a newly created bit-structure.
Any help appreciated.

Say I've an imap.mrk binary file,
I've defined class MRK as follows:

require 'bit-struct'

class MRK < BitStruct
unsigned :version, 4, "Version"
unsigned :uid_Validity, 4, "UIDValidity"
unsigned :uid_next, 4, "UIDNext"
unsigned :last_write_counter, 4, "LastWriteCounter"
rest :unused, "Unused"
end

mrk = MRK.new

And now: how to populate the mrk instance just created from the imap.mrk
binary file?


Thank you

--
Posted via http://www.ruby-....

Ara.T.Howard

7/17/2006 4:07:00 PM

0

Simon Kröger

7/17/2006 8:38:00 PM

0

ara.t.howard@noaa.gov wrote:
> On Tue, 18 Jul 2006, Fabio Vitale wrote:
>
>> Daniel Martin wrote:
>>> Fabio Vitale <fabio@sferaconsulting.it> writes:
>>>
>>>> I've the need to parse a binary file with the following structure:
>>>> How can I accomplish this in Ruby?
>>>
>> require 'bit-struct'
>>
>> class MRK < BitStruct
>> unsigned :version, 4, "Version"
>> unsigned :uid_Validity, 4, "UIDValidity"
>> unsigned :uid_next, 4, "UIDNext"
>> unsigned :last_write_counter, 4, "LastWriteCounter"
>> rest :unused, "Unused"
>> end
>>
>> mrk = MRK.new
>>
>> And now: how to populate the mrk instance just created from the imap.mrk
>> binary file?
>
> without even looking at the docs i'd guess you could do
>
> data = IO.read 'your.data'
>
> mrk = MRK.new data
>
>
> and, indeed, this seems to work:
>
> [snip]

This looks like a nice way.
I just wanted to show that in such a simple case unpack isn't that ugly, too.

open('file.bin', 'rb').do |f|
version, uidValid, uidNext, lwCounter = f.read(36).unpack('IIII')
name, flags, uid, size, date = f.read(36).unpack('Z23CIII')

#do something
end

This is of course untested because i don't have such a file, but i hope
the idea is clear.

cheers

Simon

Daniel Martin

7/17/2006 8:49:00 PM

0

Fabio Vitale <fabio@sferaconsulting.it> writes:

> And now: how to populate the mrk instance just created from the imap.mrk
> binary file?

First off, the other message's advice about your field sizes should be
taken (you want to use "32", not "4"). Also, you almost certainly
want to add :endian => :native to your structure. Finally, you'll
want to adjust the bit_length method of your MRKHeader class since it
won't construct the appropriate length just from the field info.

class MRKHeader < BitStruct
unsigned :version, 32, "Version", :endian => :native
unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
unsigned :uid_next, 32, "UIDNext", :endian => :native
unsigned :last_write_counter, 32, "LastWriteCounter", :endian => :native
rest :unused, "Unused"
def MRKHeader.bit_length
super
36*8
end
end

Okay, now let's assume that you also define the per-message structure
using BitStruct as MRKMessage. (For the message code, you don't need
to redefine bit_length since it can be computed straight from the
fields. Do however use the endianness option on all the integers)

Then:

File.open("imap.mrk") {|f|
head_string = f.read(MRKHeader.round_byte_length)
raise "No header!" unless head_string
mrk_header = MRKHeader.new(head_string)
puts mrk_header.inspect
while msg_string = f.read(MRKMessage.round_byte_length) do
puts MRKMessage.new(msg_string)
end
}

Daniel Martin

7/17/2006 9:06:00 PM

0

Daniel Martin <martind@martinhouse.internal> writes:

> Then:
>
> File.open("imap.mrk") {|f|
> head_string = f.read(MRKHeader.round_byte_length)
> raise "No header!" unless head_string
> mrk_header = MRKHeader.new(head_string)
> puts mrk_header.inspect
> while msg_string = f.read(MRKMessage.round_byte_length) do
> puts MRKMessage.new(msg_string)
> end
> }

I forgot to open the file in binary mode, and forgot an inspect call.
I should have said:

File.open("imap.mrk", "rb") {|f|
head_string = f.read(MRKHeader.round_byte_length)
raise "No header!" unless head_string
mrk_header = MRKHeader.new(head_string)
puts mrk_header.inspect
while msg_string = f.read(MRKMessage.round_byte_length) do
puts MRKMessage.new(msg_string).inspect
end
}

Fabio Vitale

7/18/2006 6:20:00 AM

0

Daniel Martin wrote:
> Daniel Martin <martind@martinhouse.internal> writes:
>

require 'bit-struct'

class MRKHeader < BitStruct
unsigned :version, 32, "Version", :endian => :native
unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
unsigned :uid_next, 32, "UIDNext", :endian => :native
unsigned :last_write_counter, 32, "LastWriteCounter", :endian =>
:native
rest :unused, "Unused"
def MRKHeader.bit_length
super
36*8
end
end

File.open("imap.mrk", "rb") {|f|
head_string = f.read(MRKHeader.round_byte_length)
raise "No header!" unless head_string
mrk_header = MRKHeader.new(head_string)
puts mrk_header.inspect
while msg_string = f.read(MRKMessage.round_byte_length) do
puts MRKMessage.new(msg_string).inspect
end
}

Now an error is raised:
>ruby b.rb
#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872, unused="">
b.rb:19: uninitialized constant MRKMessage (NameError)
from b.rb:14
>Exit code: 1

Also the problem is that there is to process the Message data structure:
how can I accomplish this?
Thank you all very much for the help!

--
Posted via http://www.ruby-....

Fabio Vitale

7/18/2006 6:39:00 AM

0

Fabio Vitale wrote:

This is the structure of class MRKMessage:

Message data (36 bytes per message):
- Filename (23 bytes including terminating NUL character)
- Flags (1 byte bitmask)
- UID (4 byte unsigned integer)
- Message size (4 byte unsigned integer)
- Date (4 byte time_t value)

Flags mask is 1:Recent, 2:Draft, 4:Deleted, 8:Flagged, 16:Answered,
32:Seen.

Now 3 major questions:

Q 1: what type must I declare for Filename in the class MRKMessage?

Q 2: what type must I declare for Flags in the class MRKMessage?

Q 3: what type must I declare for Date in the class MRKMessage?

...and 2 minor ones :-))

Q 4: How to decode Flags?

Q 5: How to decode Date?

BIG BIG THANKS TO ALL!

------------
require 'bit-struct'
class MRKHeader < BitStruct
unsigned :version, 32, "Version", :endian => :native
unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
unsigned :uid_next, 32, "UIDNext", :endian => :native
unsigned :last_write_counter, 32, "LastWriteCounter", :endian =>
:native
rest :unused, "Unused"
def MRKHeader.bit_length
super
36*8
end
end

class MRKMessage < BitStruct
char :filename, 184, "FileName", :endian => :native
unsigned :flags, 8, "Flags", :endian => :native
unsigned :uid, 32, "UID", :endian => :native
unsigned :msg_size, 32, "MsgSize", :endian => :native
unsigned :date, 32, "Date", :endian => :native
def MRKMessage.bit_length
super
36*8
end
end

File.open("imap.mrk", "rb") {|f|
head_string = f.read(MRKHeader.round_byte_length)
raise "No header!" unless head_string
mrk_header = MRKHeader.new(head_string)
puts mrk_header.inspect
while msg_string = f.read(MRKMessage.round_byte_length) do
puts MRKMessage.new(msg_string).inspect
end
}

This now generates:

>ruby b.rb
#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872, unused="">
#<MRKMessage
filename="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\nmd5",
flags=48, uid=808464432, msg_size=942814256, date=1936535094>
#<MRKMessage
filename="g\000\000\000\000\000\00006\020\000\000\374P\000\000k\353\246Cmd5",
flags=48, uid=808464432, msg_size=858993712, date=1936535091>
#<MRKMessage filename="g\000\000\000\000\000\000
e\020\000\000\334\226\003\000X\373\253Cmd5", flags=48, uid=808464432,
msg_size=858993712, date=1936535092>

--
Posted via http://www.ruby-....

Daniel Martin

7/18/2006 3:11:00 PM

0

Fabio Vitale <fabio@sferaconsulting.it> writes:

> Now 3 major questions:
>
> Q 1: what type must I declare for Filename in the class MRKMessage?

Okay, first off I apologize but I lead you astray. Apparently it's
not enough to override bit_length in your subclass. When you read the
file, you're not getting the stuff lined up properly. Therefore I've
decided to make up for it by finishing the rest of your code for you.

Note that now I override round_byte_length instead, and we get:

require 'bit-struct'
class MRKHeader < BitStruct
unsigned :version, 32, "Version", :endian => :native
unsigned :uid_Validity, 32, "UIDValidity", :endian => :native
unsigned :uid_next, 32, "UIDNext", :endian => :native
unsigned :last_write_counter, 32, "LastWriteCounter", :endian => :native
rest :unused, "Unused"
# Override so that it gets padded properly
def MRKHeader.round_byte_length
super
36
end
end

# Ideally, I'd construct some sort of "flags" bit-struct field
# Or define a boolean field type and make this a series of boolean
# fields.

# However, for now we can deal with a series of 0s and 1s

class MRKMessageFlags < BitStruct
unsigned :flagUnused, 2, "Unused"
unsigned :flagSeen, 1, "Seen"
unsigned :flagAnswered, 1, "Answered"
unsigned :flagFlagged, 1, "Flagged"
unsigned :flagDeleted, 1, "Deleted"
unsigned :flagDraft, 1, "Draft"
unsigned :flagRecent, 1, "Recent"
end

class MRKMessage < BitStruct
# Note "text" for nul-terminated strings
text :filename, 23*8, "FileName", :endian => :native
nest :flags, MRKMessageFlags, "Flags"
unsigned :uid, 32, "UID", :endian => :native
unsigned :msg_size, 32, "MsgSize", :endian => :native
unsigned :date, 32, "Date", :endian => :native

# Now we futz with the way that date is set and gotten.
# we rename the existing date field to __date, and
# then we supply our own meaning for "date" that does
# translation into and out of seconds-since-1970

# Again, the ideal solution would be to define a new bit-struct
# field type that did this stuff itself.

alias_method :__date=, :date=
alias_method :__date, :date
def date=(time)
self.__date= time.to_i
end
def date
Time.at(self.__date)
end
# we don't need to override the length computation here
end

File.open("imap.mrk", "rb") {|f|
head_string = f.read(MRKHeader.round_byte_length)
raise "No header!" unless head_string
mrk_header = MRKHeader.new(head_string)
puts mrk_header.inspect
while msg_string = f.read(MRKMessage.round_byte_length) do
puts MRKMessage.new(msg_string).inspect
end
}

__END__

This produces (on the first bit from your file):

#<MRKHeader version=1, uid_Validity=1106138982, uid_next=5825,
last_write_counter=9872,
unused="\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\r\n">
#<MRKMessage filename="md50000004286.msg", flags=#<MRKMessageFlags
flagUnused=0, flagSeen=1, flagAnswered=1, flagFlagged=0,
flagDeleted=0, flagDraft=0, flagRecent=0>, uid=4150, msg_size=20732,
date=Mon Dec 19 12:18:35 Eastern Standard Time 2005>

This is more what you expected, right?