Asp Forum - Structure with unsigned chars and internal alignment

pozz

8/30/2011 10:06:00 PM

I have a struct composed by two arrays of unsigned char.

struct myStruct {
unsigned char field1[2];
unsigned char field2[30];
};

Is myStruct *always* 32 bytes long? Is field2 *always* starting after
two bytes the pointer to myStruct (i.e., no padding is allowed between
field1 and field2)?

In this case, in my application I'd like to read the size of myStruct
(between 3 and 32) from a file. field1 will be always 2-bytes long,
field2 will be the size of myStruct minus 2 bytes of field1 (in the case
no padding is present between field1 and field2).

I could allocate field2 array dynamically, but I haven't malloc/free on
my embedded platform. So I decided to statically allocate the biggest
size (32 bytes, 2 bytes for field1 and 30 bytes for field2).

If the real size of myStruct, read from the configuration file, is
struct_size, how can I deduce the size of field2? Actually I use the
following formula:

field2_size = struct_size - 2

but I don't like it. It would be wrong if I'll decide to change the
size of field1 member, and it would be wrong if padding is present
between the two fields. Maybe the following is better?

field2_size = struct_size - offsetof(struct myStruct, field2)

Do you have other suggestions?

18 Answers

Scott Fluhrer

8/30/2011 10:47:00 PM

"pozz" <pozzugno@gmail.com> wrote in message
news:j3jmrn$cae$1@nnrp.ngi.it...
>I have a struct composed by two arrays of unsigned char.
>
> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
> };
>
> Is myStruct *always* 32 bytes long? Is field2 *always* starting after two
> bytes the pointer to myStruct (i.e., no padding is allowed between field1
> and field2)?

Not necessarily; the Standard allows the compiler to add padding between
field1 and field2, and also after field2.

On the other hand, I've never seen a compiler that would insert padding
between field1 and field2 in this case. It's unlikely that a compiler would
insert padding after field2 (because 32 is such a round number); however, if
you later change field2 to be (say) 31 bytes (because you've got a new
record type that's a bit bigger), I wouldn't be shocked if there do exist
compilers that would pad it out to (say) a multiple of 4 bytes.

>
> In this case, in my application I'd like to read the size of myStruct
> (between 3 and 32) from a file. field1 will be always 2-bytes long,
> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
> no padding is present between field1 and field2).
>
> I could allocate field2 array dynamically, but I haven't malloc/free on my
> embedded platform. So I decided to statically allocate the biggest size
> (32 bytes, 2 bytes for field1 and 30 bytes for field2).
>
> If the real size of myStruct, read from the configuration file, is
> struct_size, how can I deduce the size of field2? Actually I use the
> following formula:
>
> field2_size = struct_size - 2
>
> but I don't like it. It would be wrong if I'll decide to change the size
> of field1 member,

Modification of the data format would imply a change in the program. On the
other hand, if you expect that such a change is even slightly possible, I'd
suggest that you replace the 2 with a meaningful constant name (say,
SIZE_FIELD1)

> and it would be wrong if padding is present between the two fields. Maybe
> the following is better?
>
> field2_size = struct_size - offsetof(struct myStruct, field2)
>
> Do you have other suggestions?

If you're looking for something that makes less assumption on the compiler,
I suppose you could do:

typedef unsigned char myArray[ 32 ];
#define field2_offset 2

Eric Sosman

8/31/2011 1:40:00 AM

On 8/30/2011 6:05 PM, pozz wrote:
> I have a struct composed by two arrays of unsigned char.
>
> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
> };
>
> Is myStruct *always* 32 bytes long? Is field2 *always* starting after
> two bytes the pointer to myStruct (i.e., no padding is allowed between
> field1 and field2)?

No and no. Many compilers will lay out the struct as you hope,
but none are under any obligation to do so. If you want thirty-two
consecutive bytes, consider `unsigned char field_both[32]'.

> In this case, in my application I'd like to read the size of myStruct
> (between 3 and 32) from a file. field1 will be always 2-bytes long,
> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
> no padding is present between field1 and field2).

You probably do *not* want to read "the size of myStruct" from
the file; you want to read "the size of some blob of data." The two
environments (on-disk form and in-memory form) are not necessarily
identical, even if they're strongly related by intention.

I said "probably," because perhaps your file actually does hold
"the size of myStruct." This could be the case if an actual `struct
myStruct' was written to the file originally, complete with whatever
padding it might have included. If you never, never need to move the
data to another system (not even for post-mortem analysis), you can
probably get away with this.

> I could allocate field2 array dynamically, but I haven't malloc/free on
> my embedded platform. So I decided to statically allocate the biggest
> size (32 bytes, 2 bytes for field1 and 30 bytes for field2).

It's not clear that the presence or absence of malloc() has
anything to do with the presence or absence of padding.

> If the real size of myStruct, read from the configuration file, is
> struct_size, how can I deduce the size of field2? Actually I use the
> following formula:
>
> field2_size = struct_size - 2

I guess `struct_size' is something you compute from the two-byte
array? Well, it matters not: The validity of the formula depends not
on C but on the program that wrote the file in the first place. What
formula did *that* program use?

(A literal reading of your question leads to the answer "The size
of field2 is thirty, always." This may sound nit-picky, but I have a
hunch that if you think about it hard enough you'll arrive at the
question you *should* be asking instead.)

> but I don't like it. It would be wrong if I'll decide to change the size
> of field1 member, and it would be wrong if padding is present between
> the two fields. Maybe the following is better?
>
> field2_size = struct_size - offsetof(struct myStruct, field2)

Same problem: It could be right or wrong (or "not even wrong"),
because what you need to care about is who wrote the file and how.

--
Eric Sosman
esosman@ieee-dot-org.invalid

pozz

8/31/2011 6:54:00 AM

Il 31/08/2011 00:46, Scott Fluhrer ha scritto:
>> I have a struct composed by two arrays of unsigned char.
>>
>> struct myStruct {
>> unsigned char field1[2];
>> unsigned char field2[30];
>> };
>>
>> Is myStruct *always* 32 bytes long? Is field2 *always* starting after two
>> bytes the pointer to myStruct (i.e., no padding is allowed between field1
>> and field2)?
>
> Not necessarily; the Standard allows the compiler to add padding between
> field1 and field2, and also after field2.

I thought unsigned char was always aligned, so it wasn't any need to add
padding between fields of unsigned char or array of unsigned chars.

> On the other hand, I've never seen a compiler that would insert padding
> between field1 and field2 in this case. It's unlikely that a compiler would
> insert padding after field2 (because 32 is such a round number); however, if
> you later change field2 to be (say) 31 bytes (because you've got a new
> record type that's a bit bigger), I wouldn't be shocked if there do exist
> compilers that would pad it out to (say) a multiple of 4 bytes.

I think the padding after field2 is used to align array of myStruct.

>> and it would be wrong if padding is present between the two fields. Maybe
>> the following is better?
>>
>> field2_size = struct_size - offsetof(struct myStruct, field2)
>>
>> Do you have other suggestions?
>
> If you're looking for something that makes less assumption on the compiler,
> I suppose you could do:
>
> typedef unsigned char myArray[ 32 ];
> #define field2_offset 2

Maybe this is the best solution.

pozz

8/31/2011 7:07:00 AM

Il 31/08/2011 03:39, Eric Sosman ha scritto:
>> In this case, in my application I'd like to read the size of myStruct
>> (between 3 and 32) from a file. field1 will be always 2-bytes long,
>> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
>> no padding is present between field1 and field2).
>
> You probably do *not* want to read "the size of myStruct" from
> the file; you want to read "the size of some blob of data." The two
> environments (on-disk form and in-memory form) are not necessarily
> identical, even if they're strongly related by intention.

This is my case.

> I said "probably," because perhaps your file actually does hold
> "the size of myStruct." This could be the case if an actual `struct
> myStruct' was written to the file originally, complete with whatever
> padding it might have included. If you never, never need to move the
> data to another system (not even for post-mortem analysis), you can
> probably get away with this.

I understand your point and I expaling what I'm trying to do.

I have to read a file, created by another application on another
platform, that is composed by blocks of data (what you named "blob of
data"). The size of these blocks (between 3 and 32) is written in the
same file at the beginning.
A single block is composed by 2 bytes and (block_size - 2) bytes.

Because I don't know the size of blocks I'll read and I can't malloc the
right size at run-time, I was trying to define the maximum size of
block, splitting it in the two fields:

struct myStruct {
unsigned char field1[2];
unsigned char field2[30];
};

I thought I could have read the block and copy it directly to myStruct.
Anyway if padding could be present in myStruct, I can't use this approach.

Maybe the best approach is:

#define FIELD1_OFFSET 0
#define FIELD1_SIZE 2
#define FIELD2_OFFSET FIELD1_SIZE
#define BLOCK_MAXSIZE 32
void read_block(struct myStruct *s, size_t block_size) {
unsigned char block[BLOCK_MAXSIZE];
<read BLOCK_MAXSIZE bytes and copy it in block array>
memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
}

Here block_size is passed as an argument, because I don't know it in
advance.

Bartc

8/31/2011 10:50:00 AM

"pozz" <pozzugno@gmail.com> wrote in message
news:j3jmrn$cae$1@nnrp.ngi.it...

> but I don't like it. It would be wrong if I'll decide to change the size
> of field1 member, and it would be wrong if padding is present between the
> two fields. Maybe the following is better?
>
> field2_size = struct_size - offsetof(struct myStruct, field2)
>
> Do you have other suggestions?

There are any number of solutions. The following is based on contiguous
byte-arrays rather than structs. It assumes field1 is meant to represent a
number, and that it's width is 2, and that field2 is a zero-terminated
string (although these were mainly for my testing, not shown here):

#include <stdio.h>
#include <stdlib.h>

#define MAXRECSIZE 32
typedef short int16;

#define GETFIELD1(x) (*(int16*)x) /* Extract 1st 2 bytes as int */
#define GETFIELD2(x) ((char*)&x[2]) /* Char pointer to data */
#define SETFIELD1(x,v) (*(int16*)x)=v /* Write 1st 2 bytes */

int main(void) {

int field2_size;
int struct_size;
char record[MAXRECSIZE+1]={0}; /* Structs are read into this */

struct_size = readstructsize(); /* Obtain overall struct size */
field2_size = struct_size-2; /* Assume field1 is 2 bytes */

readrecord(&record,struct_size); /* Read one record */
record[field2_size+2]=0; /* Terminate if data is a string */

printf("Field1 is <%d>\n",GETFIELD1(record));
printf("Data is <%s>\n",GETFIELD2(record));
}

--
bartc

James Kuyper

8/31/2011 11:42:00 AM

On 08/31/2011 02:53 AM, pozz wrote:
> Il 31/08/2011 00:46, Scott Fluhrer ha scritto:
>>> I have a struct composed by two arrays of unsigned char.
>>>
>>> struct myStruct {
>>> unsigned char field1[2];
>>> unsigned char field2[30];
>>> };
>>>
>>> Is myStruct *always* 32 bytes long? Is field2 *always* starting after two
>>> bytes the pointer to myStruct (i.e., no padding is allowed between field1
>>> and field2)?
>>
>> Not necessarily; the Standard allows the compiler to add padding between
>> field1 and field2, and also after field2.
>
> I thought unsigned char was always aligned, so it wasn't any need to add
> padding between fields of unsigned char or array of unsigned chars.

Yes, unsigned char is always aligned. Yes, there's no need for such
padding. As a result, such padding is unlikely to occur. But it is
allowed by the standard, even though it is not needed.
--
James Kuyper

David Resnick

8/31/2011 1:32:00 PM

On Aug 30, 6:05 pm, pozz <pozzu...@gmail.com> wrote:
> I have a struct composed by two arrays of unsigned char.
>
> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
>
> };
>
> Is myStruct *always* 32 bytes long? Is field2 *always* starting after
> two bytes the pointer to myStruct (i.e., no padding is allowed between
> field1 and field2)?
>

As others have said, probably but not guaranteed.

If you are willing to leave portability behind, many (most?) compilers
have a specific way to indicate that you want a struct to be "packed",
with out padding. That might suit your needs here too. An example of
that is gcc's __attribute__ ((packed)).

lawrence.jones

8/31/2011 4:57:00 PM

Scott Fluhrer <sfluhrer@ix.netcom.com> wrote:
>
> "pozz" <pozzugno@gmail.com> wrote in message
> news:j3jmrn$cae$1@nnrp.ngi.it...
> >I have a struct composed by two arrays of unsigned char.
> >
> > struct myStruct {
> > unsigned char field1[2];
> > unsigned char field2[30];
> > };
> >
> > Is myStruct *always* 32 bytes long? Is field2 *always* starting after two
> > bytes the pointer to myStruct (i.e., no padding is allowed between field1
> > and field2)?
>
> Not necessarily; the Standard allows the compiler to add padding between
> field1 and field2, and also after field2.
>
> On the other hand, I've never seen a compiler that would insert padding
> between field1 and field2 in this case.

I have. I don't remember exactly which one, but I distinctly remember
using a compiler that aligned arrays on at least a 4-byte boundary. I
presume it would also add trailing padding in this case to make the
struct 36 bytes long.
--
Larry Jones

Buddy, if you think I'm even going to BE here, you're crazy! -- Calvin

Eric Sosman

9/1/2011 1:26:00 AM

On 8/31/2011 3:07 AM, pozz wrote:
> Il 31/08/2011 03:39, Eric Sosman ha scritto:
>>> In this case, in my application I'd like to read the size of myStruct
>>> (between 3 and 32) from a file. field1 will be always 2-bytes long,
>>> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
>>> no padding is present between field1 and field2).
>>
>> You probably do *not* want to read "the size of myStruct" from
>> the file; you want to read "the size of some blob of data." The two
>> environments (on-disk form and in-memory form) are not necessarily
>> identical, even if they're strongly related by intention.
>
> This is my case.
>
>
>> I said "probably," because perhaps your file actually does hold
>> "the size of myStruct." This could be the case if an actual `struct
>> myStruct' was written to the file originally, complete with whatever
>> padding it might have included. If you never, never need to move the
>> data to another system (not even for post-mortem analysis), you can
>> probably get away with this.
>
> I understand your point and I expaling what I'm trying to do.

I'm not so sure you understand my points. For myself, I'm
*sure* I don't understand "expaling."

> I have to read a file, created by another application on another
> platform, that is composed by blocks of data (what you named "blob of
> data"). The size of these blocks (between 3 and 32) is written in the
> same file at the beginning.
> A single block is composed by 2 bytes and (block_size - 2) bytes.

You haven't said so, but I guess that the initial two bytes
somehow encode `block_size'. Whether the remaining bytes are all
"payload" or may themselves include padding may be known to you, but
remains a mystery to the rest of us.

> Because I don't know the size of blocks I'll read and I can't malloc the
> right size at run-time, I was trying to define the maximum size of
> block, splitting it in the two fields:

Again the hangup over the absence of malloc(). If this has
anything at all to do with the problem, it has to do with some aspect
of the problem that you have not yet revealed. Based on what you've
said and shown, the existence or non-existence of malloc() has zilch
to do with the matter.

> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
> };

... but here comes the "splitting it in the two fields" part,
which you're going about (as several people have told you) in an
unreliable way.

> I thought I could have read the block and copy it directly to myStruct.
> Anyway if padding could be present in myStruct, I can't use this approach.
>
> Maybe the best approach is:
>
> #define FIELD1_OFFSET 0
> #define FIELD1_SIZE 2
> #define FIELD2_OFFSET FIELD1_SIZE
> #define BLOCK_MAXSIZE 32
> void read_block(struct myStruct *s, size_t block_size) {
> unsigned char block[BLOCK_MAXSIZE];
> <read BLOCK_MAXSIZE bytes and copy it in block array>
> memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
> memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
> }
>
> Here block_size is passed as an argument, because I don't know it in
> advance.

That's odd. Where do you learn `block_size', *before* reading
the first two bytes of your blob? And if `block_size' turns out to
be less than thirty-two, how does your "read BLOCK_MAXSIZE bytes"
avoid running off the end and into whatever follows the blob?

Observation: It is premature to seek the "best" way to do
something when you have not yet come up with "any" way. That's
premature optimization personified.

--
Eric Sosman
esosman@ieee-dot-org.invalid

pozz

9/1/2011 6:00:00 AM

Il 01/09/2011 03:26, Eric Sosman ha scritto:
>>> I said "probably," because perhaps your file actually does hold
>>> "the size of myStruct." This could be the case if an actual `struct
>>> myStruct' was written to the file originally, complete with whatever
>>> padding it might have included. If you never, never need to move the
>>> data to another system (not even for post-mortem analysis), you can
>>> probably get away with this.
>>
>> I understand your point and I expaling what I'm trying to do.
>
> I'm not so sure you understand my points. For myself, I'm
> *sure* I don't understand "expaling."

I wanted to write "explaining"... :-)

I think I understood your point of view. Data in the file can be of two
types:
- blob of data of exactly N bytes sized
- myStruct previously written to the file by the same software
on the same platform (so with the same layout of padding and data)
I'm in the first case. I know the file contains a sequence of blobs of
the same size N. N is constant for a file, but may vary from file to
file. So the software should be ready to read blobs of 10 or 15 or 30
or 32 bytes.
How the software can know the size of blobs in the file? There are two
bytes at the beginning of the file (just one time and *not* for each
blob) coded as a 16-bits unsigned integer in Big Endian.

>> I have to read a file, created by another application on another
>> platform, that is composed by blocks of data (what you named "blob of
>> data"). The size of these blocks (between 3 and 32) is written in the
>> same file at the beginning.
>> A single block is composed by 2 bytes and (block_size - 2) bytes.
>
> You haven't said so, but I guess that the initial two bytes
> somehow encode `block_size'. Whether the remaining bytes are all
> "payload" or may themselves include padding may be known to you, but
> remains a mystery to the rest of us.

Just to better explain the content of the file:

- 2 bytes that code the size N of all the subsequent blocks
- N bytes for block 1
- N bytes for block 2
- ...up to the end of the file

A single N-bytes block is composed by two field:
- 2 bytes for field1
- N-2 bytes for field2

field1 and field2 are application data that aren't important now for our
discussion.

>> Because I don't know the size of blocks I'll read and I can't malloc the
>> right size at run-time, I was trying to define the maximum size of
>> block, splitting it in the two fields:
>
> Again the hangup over the absence of malloc(). If this has
> anything at all to do with the problem, it has to do with some aspect
> of the problem that you have not yet revealed. Based on what you've
> said and shown, the existence or non-existence of malloc() has zilch
> to do with the matter.

On a "full" operating system I could write:

open the file
read the first two bytes and calculate N
dynamically allocate an array of N bytes
read block 1 from file and copy it into array
other instructions based on array content
read block 2 from file and copy it into array
other instructions based on array content
...up to the end of file

Because I can't use malloc on my system, I was thinking to use an array
calibrated to the maximum value for N, that is 32.

static declaration of an array of 32 bytes
open the file
read the first two bytes and calculate N
read block 1 from file and copy it into array
other instructions based on array content
read block 2 from file and copy it into array
other instructions based on array content
...up to the end of file

>> struct myStruct {
>> unsigned char field1[2];
>> unsigned char field2[30];
>> };
>
> ... but here comes the "splitting it in the two fields" part,
> which you're going about (as several people have told you) in an
> unreliable way.

Logically the block is splitted in two fields, so it is useful to have a
struct like this. But I understand I can't read bytes from file and
directly copy them into the structure. I have to use a function that
reads exactly N bytes and copy the two fields into the structure.

>> I thought I could have read the block and copy it directly to myStruct.
>> Anyway if padding could be present in myStruct, I can't use this
>> approach.
>>
>> Maybe the best approach is:
>>
>> #define FIELD1_OFFSET 0
>> #define FIELD1_SIZE 2
>> #define FIELD2_OFFSET FIELD1_SIZE
>> #define BLOCK_MAXSIZE 32
>> void read_block(struct myStruct *s, size_t block_size) {
>> unsigned char block[BLOCK_MAXSIZE];
>> <read BLOCK_MAXSIZE bytes and copy it in block array>
>> memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
>> memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
>> }
>>
>> Here block_size is passed as an argument, because I don't know it in
>> advance.
>
> That's odd. Where do you learn `block_size', *before* reading
> the first two bytes of your blob?

Yes, now I think it's clearer. See above.

> And if `block_size' turns out to
> be less than thirty-two, how does your "read BLOCK_MAXSIZE bytes"
> avoid running off the end and into whatever follows the blob?

Indeed I was wrong. The code below should be correct.

#define FIELD1_OFFSET 0
#define FIELD1_SIZE 2
#define FIELD2_OFFSET FIELD1_SIZE
#define BLOCK_MAXSIZE 32
void read_block(struct myStruct *s, size_t block_size) {
unsigned char block[BLOCK_MAXSIZE];
<read block_size bytes and copy it in block array>
memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
}

comp.lang.c

Structure with unsigned chars and internal alignment

pozz

Scott Fluhrer

Eric Sosman

pozz

pozz

Bartc

James Kuyper

David Resnick

lawrence.jones

Eric Sosman

pozz

x Login to ForumsZone