Asp Forum - Reading a binary file

Angel

4/28/2011 9:39:00 PM

Hi folks,

I'm writing a program that can manipulate files in the format as
described on this site:
http://www.ugcs.caltech.edu/~jedwin/baldu...

Basically, the file contains four bytes that form a string, then two
four bytes that form a 32-bit integer, and so on.

Currently I read the file with the fread() call and structures declared
like this:

struct item_v1_header
{
char signature[4];
char version[4];
uint32_t generic_name_strref;
<...>
} __attribute__((__packed__));

This works just fine, but I was wondering if there is a more
elegant/portable way to do it.

Your thoughts?

And yes, I know there are already tools out there that can manipulate
Infinite Engine stuff, I'm just doing this for entertainment and
education. :-)

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

12 Answers

Chad

4/28/2011 9:46:00 PM

On Apr 28, 2:38 pm, Angel <angel+n...@spamcop.net> wrote:
> Hi folks,
>
> I'm writing a program that can manipulate files in the format as
> described on this site:http://www.ugcs.caltech.edu/~jedwin/baldu...
>
> Basically, the file contains four bytes that form a string, then two
> four bytes that form a 32-bit integer, and so on.
>
> Currently I read the file with the fread() call and structures declared
> like this:
>
> struct item_v1_header
> {
> char signature[4];
> char version[4];
> uint32_t generic_name_strref;
> <...>
>
> } __attribute__((__packed__));
>
> This works just fine, but I was wondering if there is a more
> elegant/portable way to do it.
>
> Your thoughts?
>
> And yes, I know there are already tools out there that can manipulate
> Infinite Engine stuff, I'm just doing this for entertainment and
> education. :-)
>

Maybe I'm not getting this, but won't this break if there is some
(additional) padding in the structure?

Chad

Angel

4/28/2011 9:52:00 PM

On 2011-04-28, Chad <cdalten@gmail.com> wrote:
> On Apr 28, 2:38?pm, Angel <angel+n...@spamcop.net> wrote:
>> Hi folks,
>>
>> I'm writing a program that can manipulate files in the format as
>> described on this site:http://www.ugcs.caltech.edu/~jedwin/baldu...
>>
>> Basically, the file contains four bytes that form a string, then two
>> four bytes that form a 32-bit integer, and so on.
>>
>> Currently I read the file with the fread() call and structures declared
>> like this:
>>
>> struct item_v1_header
>> {
>> ? char ? ? ? ? ?signature[4];
>> ? char ? ? ? ? ?version[4];
>> ? uint32_t ? ? ?generic_name_strref;
>> ? <...>
>>
>> } __attribute__((__packed__));
>>
>> This works just fine, but I was wondering if there is a more
>> elegant/portable way to do it.
>>
>> Your thoughts?
>>
>> And yes, I know there are already tools out there that can manipulate
>> Infinite Engine stuff, I'm just doing this for entertainment and
>> education. :-)
>>
>
> Maybe I'm not getting this, but won't this break if there is some
> (additional) padding in the structure?

That's exactly why the "__attribute__((__packed__))" is there at the end
of the struct declaration. My first tries indeed broke on padding. :-)

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

Ben Bacarisse

4/29/2011 1:07:00 AM

Angel <angel+news@spamcop.net> writes:

> I'm writing a program that can manipulate files in the format as
> described on this site:
> http://www.ugcs.caltech.edu/~jedwin/baldu...
>
> Basically, the file contains four bytes that form a string, then two
> four bytes that form a 32-bit integer, and so on.
>
> Currently I read the file with the fread() call and structures declared
> like this:
>
> struct item_v1_header
> {
> char signature[4];
> char version[4];
> uint32_t generic_name_strref;
> <...>
> } __attribute__((__packed__));
>
>
> This works just fine, but I was wondering if there is a more
> elegant/portable way to do it.
>
> Your thoughts?

The thing that would bother me is that this code will only work when it
runs on a machine that uses the same representation if 32-bit integers
that are used by the file format.

If that's fine, go for it. If not, you will want to read the integers
as unsigned char arrays so you can construct the integers "by value"
rather than "by representation".

<snip>
--
Ben.

Angel

4/29/2011 6:04:00 AM

On 2011-04-29, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Angel <angel+news@spamcop.net> writes:
>
>> I'm writing a program that can manipulate files in the format as
>> described on this site:
>> http://www.ugcs.caltech.edu/~jedwin/baldu...
>>
>> Basically, the file contains four bytes that form a string, then two
>> four bytes that form a 32-bit integer, and so on.
>>
>> Currently I read the file with the fread() call and structures declared
>> like this:
>>
>> struct item_v1_header
>> {
>> char signature[4];
>> char version[4];
>> uint32_t generic_name_strref;
>> <...>
>> } __attribute__((__packed__));
>>
>>
>> This works just fine, but I was wondering if there is a more
>> elegant/portable way to do it.
>>
>> Your thoughts?
>
> The thing that would bother me is that this code will only work when it
> runs on a machine that uses the same representation if 32-bit integers
> that are used by the file format.
>
> If that's fine, go for it. If not, you will want to read the integers
> as unsigned char arrays so you can construct the integers "by value"
> rather than "by representation".

Endian-ness problems have indeed crossed my mind, but since the software
that uses this file format (BioWare's Infinity Engine) only runs on
Intel anyway, I didn't consider it such a big issue.

I might use functions like le32toh() to fix this issue for completeness'
sake, but since they are not standard, I have been hesitant to do so.

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

China Blue Veins

4/29/2011 6:36:00 AM

In article <slrnirkl6u.7at.angel+news@pearlgates.net>,
Angel <angel+news@spamcop.net> wrote:

> Endian-ness problems have indeed crossed my mind, but since the software
> that uses this file format (BioWare's Infinity Engine) only runs on
> Intel anyway, I didn't consider it such a big issue.
>
> I might use functions like le32toh() to fix this issue for completeness'
> sake, but since they are not standard, I have been hesitant to do so.

The functions htons, htonl, ntohs, and ntohl are widely available and
understood. I use them as a processor neutral format: use htonX on any CPU and
you can use ntohX on any other CPU. What actually is network order doesn't
matter as long as you can convert to/from and always get the correct value.

--
Damn the living - It's a lovely life. I'm whoever you want me to be.
Silver silverware - Where is the love? At least I can stay in character.
Oval swimming pool - Where is the love? Annoying Usenet one post at a time.
Damn the living - It's a lovely life. Why does Harmony have blue veins?

Angel

4/29/2011 6:40:00 AM

On 2011-04-29, China Blue Veins <chine.bleu@yahoo.com> wrote:
> In article <slrnirkl6u.7at.angel+news@pearlgates.net>,
> Angel <angel+news@spamcop.net> wrote:
>
>> Endian-ness problems have indeed crossed my mind, but since the software
>> that uses this file format (BioWare's Infinity Engine) only runs on
>> Intel anyway, I didn't consider it such a big issue.
>>
>> I might use functions like le32toh() to fix this issue for completeness'
>> sake, but since they are not standard, I have been hesitant to do so.
>
> The functions htons, htonl, ntohs, and ntohl are widely available and
> understood. I use them as a processor neutral format: use htonX on any CPU
> and you can use ntohX on any other CPU. What actually is network order
> doesn't matter as long as you can convert to/from and always get the correct
> value.

Yes, but network byte order is big-endian, and the data in the file I'm
reading has nothing to do with the network and is little-endian, having
been created on Intel.

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

Malcolm McLean

4/29/2011 9:19:00 AM

On Apr 29, 12:38 am, Angel <angel+n...@spamcop.net> wrote:
> Hi folks,
>
>
> Currently I read the file with the fread() call and structures declared
> like this:
>
> struct item_v1_header
> {
> char signature[4];
> char version[4];
> uint32_t generic_name_strref;
> <...>
>
> } __attribute__((__packed__));
>
> [ to fread() ]
>
No this is a bad habit.

Write functions to read a 16 and 32-bit big and little endian integer
from file, then use them to read each member separately.

The software might just run on Intel now, but you wnat to be able to
move routines as easily as possible to new platforms. Why create a
platform dependency?

Eric Sosman

4/29/2011 11:55:00 AM

On 4/29/2011 2:04 AM, Angel wrote:
> [...]
> Endian-ness problems have indeed crossed my mind, but since the software
> that uses this file format (BioWare's Infinity Engine) only runs on
> Intel anyway, I didn't consider it such a big issue.
>
> I might use functions like le32toh() to fix this issue for completeness'
> sake, but since they are not standard, I have been hesitant to do so.

Your choice, of course, but it's an odd juxtaposition: To avoid
using non-standard functions, you rely on non-standard representation.
What's that line about "straining at gnats?"

--
Eric Sosman
esosman@ieee-dot-org.invalid

Angel

4/29/2011 1:47:00 PM

On 2011-04-29, Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:
> On Apr 29, 12:38?am, Angel <angel+n...@spamcop.net> wrote:
>> Hi folks,
>>
>>
>> Currently I read the file with the fread() call and structures declared
>> like this:
>>
>> struct item_v1_header
>> {
>> ? char ? ? ? ? ?signature[4];
>> ? char ? ? ? ? ?version[4];
>> ? uint32_t ? ? ?generic_name_strref;
>> ? <...>
>>
>> } __attribute__((__packed__));
>>
>> [ to fread() ]
>>
> No this is a bad habit.

Well, actually I started this just as an excuse to meddle with the
fread() function. Since the file consists of a header followed by one or
more data blocks all with the same layout, it seemed the most logical
choice. And I was actually considering perhaps using mmap() instead of
fread().

> Write functions to read a 16 and 32-bit big and little endian integer
> from file, then use them to read each member separately.

Each structure has quite a few such members, resulting in a huge number
of separate reads. Though I agree that is the most portable way to do
it. But also the most boring, and I am doing this mainly for fun. (And
to learn something, hence my post here.)

> The software might just run on Intel now, but you wnat to be able to
> move routines as easily as possible to new platforms. Why create a
> platform dependency?

The software that uses these files is over 10 years old and proprietary,
I don't think it'll be ported to any other platform anytime soon. :-)
(Yes, there is a Linux implementation in GemRB, but as far as I know
that one only works on Intel as well.)

So platform independence was not the first thing on my mind when I
started on this, but with the great hints I've been given here it is
something I will attempt to achieve.

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

Angel

4/29/2011 1:51:00 PM

On 2011-04-29, Eric Sosman <esosman@ieee-dot-org.invalid> wrote:
> On 4/29/2011 2:04 AM, Angel wrote:
>> [...]
>> Endian-ness problems have indeed crossed my mind, but since the software
>> that uses this file format (BioWare's Infinity Engine) only runs on
>> Intel anyway, I didn't consider it such a big issue.
>>
>> I might use functions like le32toh() to fix this issue for completeness'
>> sake, but since they are not standard, I have been hesitant to do so.
>
> Your choice, of course, but it's an odd juxtaposition: To avoid
> using non-standard functions, you rely on non-standard representation.
> What's that line about "straining at gnats?"

The presentation was already set, I did not invent the file format. And
keep in mind, I'm merely doing this for fun and education, I think I'm
allowed a bit of freedom that I would not have in a professional
environment. :-)

--
The perfected state of a spam server is a smoking crater.
- The Crater Corollary to Rule #4

comp.lang.c

Reading a binary file

Angel

Chad

Angel

Ben Bacarisse

Angel

China Blue Veins

Angel

Malcolm McLean

Eric Sosman

Angel

Angel

x Login to ForumsZone