Asp Forum - Serializing bit field structures

(2b|!2b)==?

10/21/2008 9:19:00 AM

I have a struct declared as follows:

struct RecordType1
{
unsigned int dt : 24; //3 bytes
unsigned int ts : 16; //2 bytes
unsigned int lsp : 24; //3 bytes (float value represented as int)
unsigned int lst : 16; //2 bytes
unsigned int lsv : 16; //2 bytes
unsigned int x1 : 24; //3 bytes (float value represented as int)
unsigned int x2 : 24; //3 bytes (float value represented as int)
unsigned int x3 : 24; //3 bytes (float value represented as int)
unsigned int x4 : 24; //3 bytes (float value represented as int)
unsigned int bv : 16; //2 bytes
unsigned int ak : 24; //3 bytes (float value represented as int)
unsigned int av : 16; //2 bytes
unsigned int cv : 24; //3 bytes
};

I need to serialize this struct by packing the bits into a contiguous
byte array, and then read it back from the byte array. I cant use
memcpy/sizeof because of boundary alignment ...

I'd appreciate if anyone can show me how to do this. Ieally, I would
like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.

13 Answers

Thomas J. Gritzan

10/21/2008 3:22:00 PM

(2b|!2b)==? schrieb:
> I have a struct declared as follows:
>
> struct RecordType1
> {
> unsigned int dt : 24; //3 bytes
> unsigned int ts : 16; //2 bytes
> unsigned int lsp : 24; //3 bytes (float value represented as
> int)
> unsigned int lst : 16; //2 bytes
> unsigned int lsv : 16; //2 bytes
> unsigned int x1 : 24; //3 bytes (float value represented as int)
> unsigned int x2 : 24; //3 bytes (float value represented as int)
> unsigned int x3 : 24; //3 bytes (float value represented as int)
> unsigned int x4 : 24; //3 bytes (float value represented as int)
> unsigned int bv : 16; //2 bytes
> unsigned int ak : 24; //3 bytes (float value represented as int)
> unsigned int av : 16; //2 bytes
> unsigned int cv : 24; //3 bytes
> };
>
> I need to serialize this struct by packing the bits into a contiguous
> byte array, and then read it back from the byte array. I cant use
> memcpy/sizeof because of boundary alignment ...

Huh?

> I'd appreciate if anyone can show me how to do this. Ieally, I would
> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.

In what endianness do you want to store it?

Let's assume 8 bit bytes, unsigned int at least sizeof(3), and you want
to output in network byte order (big endian).

Here's a quick'n'dirty solution just to show you the main idea:

// helper functions
void put8(std::ostream& out, unsigned int val)
{
assert(val <= 0xFF);
out.put(val);
}
void put16(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFF);
put8(out, val >> 8);
put8(out, val & 0xFF);
}
void put24(std::ostream& out, unsigned int val)
{
assert(val <= 0xFFFFFF);
put8(out, val >> 16);
put16(out, val & 0xFFFF);
}

// could be an operator<<, too.
void serialize(std::ostream& out, const RecordType1& data)
{
put24(out, data.dt);
put16(out, data.ts);
put24(out, data.lsp);
// and so on...
}

To read them back, you would read two or three bytes, left shift the
high bytes and binary-OR them together.

If you don't want to use ostream/istream, you would have to track the
current position in the array (in the put functions). An output iterator
might be an elegant solution.

--
Thomas

diamondback

10/21/2008 5:05:00 PM

On Oct 21, 2:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
> I have a struct declared as follows:
>
> struct RecordType1
> {
> unsigned int dt : 24; //3 bytes
> unsigned int ts : 16; //2 bytes
> unsigned int lsp : 24; //3 bytes (float value represented as int)
> unsigned int lst : 16; //2 bytes
> unsigned int lsv : 16; //2 bytes
> unsigned int x1 : 24; //3 bytes (float value represented as int)
> unsigned int x2 : 24; //3 bytes (float value represented as int)
> unsigned int x3 : 24; //3 bytes (float value represented as int)
> unsigned int x4 : 24; //3 bytes (float value represented as int)
> unsigned int bv : 16; //2 bytes
> unsigned int ak : 24; //3 bytes (float value represented as int)
> unsigned int av : 16; //2 bytes
> unsigned int cv : 24; //3 bytes
>
> };
>
> I need to serialize this struct by packing the bits into a contiguous
> byte array, and then read it back from the byte array. I cant use
> memcpy/sizeof because of boundary alignment ...
>
> I'd appreciate if anyone can show me how to do this. Ieally, I would
> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.

First of all, there is no way to get around the endian-ness issue. Any
client that reads this data needs to know what order the bytes are
arriving in. There is simply no way around it. The bytes arrive
serialized, "one-at-a-time" if you will.

But, I'll get to that in a moment. A quick and dirty way of dealing
with serialization is a trick with unions. So:

union RecSerializer
{
RecordType1 record;
unsigned char stream[sizeof(RecordType1)];
};

Now, record and stream both occupy the same memory, so the data can be
accessed via either member, depending on what you are doing. So, you
load the memory using the structure (record):

RecSerializer m_rs;
m_rs.record.dt = 1;
m_rs.record.ts = 2;
m_rs.record.lsp = 3;
....

Then you send it using the byte array (stream):

<networkConnection>.send( m_rs.stream, sizeof(RecordType1) );

Reading and de-serializing is simply a reverse of the sending process.

However, this does not take into account cross platform endianess
issues. Like I said above, this is the language barrier that confronts
anyone who does cross-platform network communication. You must deal
with it. Sorry. Luckily, you have some choices on how to do this:

The easiest(?) way is to just insist that everyone play nice and use
the same endianness. If you can accomplish this, please run for
President. I will vote for you...twice. Otherwise, you need to agree
to disagree and standardize on something. Luckily, the Internet
protocols use big-endian byte order and the POSIX byte order
functions htons, htonl, ntohs, and ntohl can be used for marshalling
and demarshalling data. These are platform independent functions that
reorder the bytes in standard data to conform to the Internet byte
order and back. All clients on your network must agree to conform to
the standard, obviously. However, these functions work on standard 2
or 4 byte boundaries only. So, these will not work for you in your
current design. My initial reaction, not knowing the details of your
system, would be to question if you absolutely must use bit-fields in
the structure? Processing would be easier, and potentially faster, if
you stuck with standard byte boundaries. But, I will assume you have
considered this and I will proceed under the assumption that the odd
byte boundaries are required.

A clever method of dealing with byte order could be to take a cue from
Unicode encoded files and include a Byte Order Mark (BOM) as the first
two bytes of the message. The BOM would have a value that could not be
accidentally inverted. Something simple like 0xFFEE, for example,
would work fine. With the BOM in place, you simply serialize and send
the message, ignoring byte order. However, the receiving client would
de-serialize and read the first two bytes. If the bytes are in the
expected order (0xFFEE), the de-serialization can continue with no
further processing. But, if the BOM is read backwards (0xEEFF), the
client knows that the message was sent with a different endianness and
must be further processed to extract the data.

So, your options are:
1) Get everyone to agree on endianness (and bring world peace)
2) Change your data definition to facilitate the use of POSIX byte
order conversion.
3) Use a "BOM" (or some other order marker) in your data definition.

I hope that helps. If not, I hope someone else has a better idea.

James Kanze

10/22/2008 8:03:00 AM

On Oct 21, 11:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
> I have a struct declared as follows:

> struct RecordType1
> {
> unsigned int dt : 24; //3 bytes
> unsigned int ts : 16; //2 bytes
> unsigned int lsp : 24; //3 bytes (float value represented as int)
> unsigned int lst : 16; //2 bytes
> unsigned int lsv : 16; //2 bytes
> unsigned int x1 : 24; //3 bytes (float value represented as int)
> unsigned int x2 : 24; //3 bytes (float value represented as int)
> unsigned int x3 : 24; //3 bytes (float value represented as int)
> unsigned int x4 : 24; //3 bytes (float value represented as int)
> unsigned int bv : 16; //2 bytes
> unsigned int ak : 24; //3 bytes (float value represented as int)
> unsigned int av : 16; //2 bytes
> unsigned int cv : 24; //3 bytes
> };

Note that on a 32 bit machine, the only effect your bit fields
are likely to have here is to slow things down, since generally,
the compiler won't allocate a bit field in a way that would
cross a 32 bit boundary. lst and lsv will be put into a single
word, but that's about it, and you could get that by declaring
them as unsigned short. If you're really concerned about memory
use, you probably need to declare each field to be an array of
unsigned char of the correct size, and use memcpy for copying in
and out. Otherwise, just drop the bit fields---they don't buy
you anything. (Here---if you had 8 or ten in a row, of just a few
bits, they could make a difference. But there's rarely any
sense in having bit fields larger than 8 bits.)

> I need to serialize this struct by packing the bits into a
> contiguous byte array, and then read it back from the byte
> array. I cant use memcpy/sizeof because of boundary alignment
> ...

> I'd appreciate if anyone can show me how to do this. Ieally, I
> would like to this in a cross platform (i.e. "ENDIAN-ness"
> agnostic) way.

The first thing you'll have to do is define the format you want
for the serialized data. Once you've done that, you need to
process each field separately. If we assume that you have 16
bit unsigned integral values, and 24 bit custom floating point,
kept internally in an unsigned int, and that you decide to use
the standard network byte order (for unsigned values, byte order
is the only concern, and you've alread specified a floating
point format which maps it to an unsigned value), then something
like:

void
putIntValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}

void
putFloatValue( std::ostream& dest, unsigned value )
{
dest << ((value >> 16) & 0xFF)
<< ((value >> 8) & 0xFF)
<< ((value ) & 0xFF) ;
}

would do the trick; even cleaner would be to define your own
stream types for this (inheriting from std::ios, but not from
std::istream or std::ostream), with << and >> operators for the
basic types your concerned with (with the conversions between
float and your representation also taking place in the << and >>
operator).

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

10/22/2008 8:33:00 AM

On Oct 21, 7:04 pm, diamondback <christopher....@gmail.com> wrote:
> On Oct 21, 2:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
> > I have a struct declared as follows:

> > struct RecordType1
> > {
> > unsigned int dt : 24; //3 bytes
> > unsigned int ts : 16; //2 bytes
> > unsigned int lsp : 24; //3 bytes (float value represented as int)
> > unsigned int lst : 16; //2 bytes
> > unsigned int lsv : 16; //2 bytes
> > unsigned int x1 : 24; //3 bytes (float value represented as int)
> > unsigned int x2 : 24; //3 bytes (float value represented as int)
> > unsigned int x3 : 24; //3 bytes (float value represented as int)
> > unsigned int x4 : 24; //3 bytes (float value represented as int)
> > unsigned int bv : 16; //2 bytes
> > unsigned int ak : 24; //3 bytes (float value represented as int)
> > unsigned int av : 16; //2 bytes
> > unsigned int cv : 24; //3 bytes
> > };

> > I need to serialize this struct by packing the bits into a
> > contiguous byte array, and then read it back from the byte
> > array. I cant use memcpy/sizeof because of boundary
> > alignment ...

> > I'd appreciate if anyone can show me how to do this. Ieally,
> > I would like to this in a cross platform (i.e. "ENDIAN-ness"
> > agnostic) way.

> First of all, there is no way to get around the endian-ness
> issue. Any client that reads this data needs to know what
> order the bytes are arriving in. There is simply no way around
> it. The bytes arrive serialized, "one-at-a-time" if you will.

More generally, he really has to define a serialization format,
period. Of course, for unsigned, endianness is about the only
issue. And he's done part of the work already, since he's
defined how to represent floats, except for the endianness.

> But, I'll get to that in a moment. A quick and dirty way of dealing
> with serialization is a trick with unions. So:

> union RecSerializer
> {
> RecordType1 record;
> unsigned char stream[sizeof(RecordType1)];
> };

> Now, record and stream both occupy the same memory, so the
> data can be accessed via either member, depending on what you
> are doing.

Read access can only access the last member written; otherwise,
you have undefined behavior. Formally, a compiler is allowed to
arrange for some sort of secondary store to remember the last
field written, and check it when reading. I think that there
was once a compiler which did this, but it's certainly not
frequent. And of course, reading a record when you stored
random data through stream could result in a core dump or the
equivalent on some architectures (Unisys MCP, for example).

> So, you load the memory using the structure (record):

> RecSerializer m_rs;
> m_rs.record.dt = 1;
> m_rs.record.ts = 2;
> m_rs.record.lsp = 3;
> ...

> Then you send it using the byte array (stream):

> <networkConnection>.send( m_rs.stream, sizeof(RecordType1) );

> Reading and de-serializing is simply a reverse of the sending
> process.

All of which is undefined behavior, and can in practice generate
a core dump on some less common architectures.

> However, this does not take into account cross platform
> endianess issues. Like I said above, this is the language
> barrier that confronts anyone who does cross-platform network
> communication. You must deal with it. Sorry. Luckily, you have
> some choices on how to do this:

> The easiest(?) way is to just insist that everyone play nice
> and use the same endianness. If you can accomplish this,
> please run for President. I will vote for you...twice.
> Otherwise, you need to agree to disagree and standardize on
> something. Luckily, the Internet protocols use big-endian byte
> order and the POSIX byte order functions htons, htonl, ntohs,
> and ntohl can be used for marshalling and demarshalling data.
> These are platform independent functions[...]

They're not portable, and they aren't really meaningful for some
(many) platforms, since they consider that there can only be two
possible byte orders (there are 24 possible orderings for 4
bytes, and I've seen at least three in actual practice), and
they ignore all other representation issues (and possibly
alignment issues).

Repeat after me: endianness is just the tip of the iceberg. The
htonxxx and ntohxxx functions are just hacks, designed as a
quick work-around in order to communicate between two fixed
architectures, and are not generally useful (except perhaps when
addressing the system API---a system dependent context).

Given his description of the floating point format in another
thread, I would imagine something like:

oxxxstream&
oxxxstream::operator<<(
float value )
{
assert( value >= 0.0 && value < 8 ) ;
int exp ;
int mant
= frexp( value, &exp ) * (1 << 21) ;
std::streambuf* sb = rdbuf() ;
sb->sputc( (exp << 5) | (mant >> 16) ) ;
sb->sputc( (mant >> 8) & 0xFF ) ;
sb->sputc( mant & 0xFF ) ;
}

(This code lacks any error handling; you need to verify the
return value of sb->sputc, and set badbit in the stream if it is
EOF. And not do any further output if the stream has failed. I
generally use a special class for this, which maintains a
reference to the stream and the pointer to the streambuf, and
has a single put function:

void
GuardedOutput::put( unsigned char ch )
{
if ( myStream && myStreambuf->sputc( ch ) == EOF ) {
myStream.setstate( std::ios::badbit ) ;
}
}

Also, I normally avoid bitwise operators on signed types. In
this case, however, the types are partially conditioned by the
signature of frexp, and the precondition checks guarantees that
I'll never get a negative value with the operations I do, so the
signed int behaves exactly like an unsigned int.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nick Keighley

10/22/2008 10:13:00 AM

On 21 Oct, 10:18, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:

> I have a struct declared as follows:
>
> struct RecordType1
> {
> unsigned int dt : 24; //3 bytes
> unsigned int ts : 16; //2 bytes
> unsigned int lsp : 24; //3 bytes (float value represented as

<snip>

(all examples given are 16 or 24 bits wide)

> };
>
> I need to serialize this struct by packing the bits into a contiguous
> byte array, and then read it back from the byte array. I cant use
> memcpy/sizeof because of boundary alignment ...
>
> I'd appreciate if anyone can show me how to do this. Ieally, I would
> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.

1. you can't write this in an endian agnostic manner
"but that's the worse thing that could possible happen!".
As other have said, decide on an endianness and write platform
specific code to read/write the data in its correct endianess.

2. bitfields are even less portable than the above implies.
"but it's worse than that!"

I'd check the standard but I believe almost nothing
can be assumed about bitfield alignment or padding.
I'm not sure even order is guaranteed.

K&R section 6.9 (and I doubt C++ has changed anything) has this to
say:

"Almost everything about [bit] fields is implementation-dependent.
Whether a field may overlap a word boudary is [ID]. [...] Fields are
assigned left to right on some machines and right to left on others.
This means that although fields are useful for maintaining
internally-defined data structures, the question as to which end
comes first has to be carefully considered when picking apart
externally-defined data; [...]"

(typos and layout mangling in the above are my fault)

--
Nick Keighley

People who love sausages, respect the law,
and work with IT standards
shouldn't watch any of them being made.

(2b|!2b)==?

10/22/2008 10:23:00 AM

James Kanze wrote:
> On Oct 21, 11:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
>> I have a struct declared as follows:
>
>> struct RecordType1
>> {
>> unsigned int dt : 24; //3 bytes
>> unsigned int ts : 16; //2 bytes
>> unsigned int lsp : 24; //3 bytes (float value represented as int)
>> unsigned int lst : 16; //2 bytes
>> unsigned int lsv : 16; //2 bytes
>> unsigned int x1 : 24; //3 bytes (float value represented as int)
>> unsigned int x2 : 24; //3 bytes (float value represented as int)
>> unsigned int x3 : 24; //3 bytes (float value represented as int)
>> unsigned int x4 : 24; //3 bytes (float value represented as int)
>> unsigned int bv : 16; //2 bytes
>> unsigned int ak : 24; //3 bytes (float value represented as int)
>> unsigned int av : 16; //2 bytes
>> unsigned int cv : 24; //3 bytes
>> };
>
> Note that on a 32 bit machine, the only effect your bit fields
> are likely to have here is to slow things down, since generally,
> the compiler won't allocate a bit field in a way that would
> cross a 32 bit boundary. lst and lsv will be put into a single
> word, but that's about it, and you could get that by declaring
> them as unsigned short. If you're really concerned about memory
> use, you probably need to declare each field to be an array of
> unsigned char of the correct size, and use memcpy for copying in
> and out. Otherwise, just drop the bit fields---they don't buy
> you anything. (Here---if you had 8 or ten in a row, of just a few
> bits, they could make a difference. But there's rarely any
> sense in having bit fields larger than 8 bits.)

Ah, but its not memory use that I'm concerned with. Its disk space, The
structs are formats for a database I am writing. I am receiving an
additional 150Mb of data each day into the database, and using bit
fields to pack the data offers approx a 35-40% reduction in the storage
space required

>
>> I need to serialize this struct by packing the bits into a
>> contiguous byte array, and then read it back from the byte
>> array. I cant use memcpy/sizeof because of boundary alignment
>> ...
>
>> I'd appreciate if anyone can show me how to do this. Ieally, I
>> would like to this in a cross platform (i.e. "ENDIAN-ness"
>> agnostic) way.
>
> The first thing you'll have to do is define the format you want
> for the serialized data. Once you've done that, you need to
> process each field separately. If we assume that you have 16
> bit unsigned integral values, and 24 bit custom floating point,
> kept internally in an unsigned int, and that you decide to use
> the standard network byte order (for unsigned values, byte order
> is the only concern, and you've alread specified a floating
> point format which maps it to an unsigned value), then something
> like:
>
> void
> putIntValue( std::ostream& dest, unsigned value )
> {
> dest << ((value >> 8) & 0xFF)
> << ((value ) & 0xFF) ;
> }
>
> void
> putFloatValue( std::ostream& dest, unsigned value )
> {
> dest << ((value >> 16) & 0xFF)
> << ((value >> 8) & 0xFF)
> << ((value ) & 0xFF) ;
> }
>
Good idea to use network byte order (thanks). The functions above will
do the trick and are a good starting point...

> would do the trick; even cleaner would be to define your own
> stream types for this (inheriting from std::ios, but not from
> std::istream or std::ostream), with << and >> operators for the
> basic types your concerned with (with the conversions between
> float and your representation also taking place in the << and >>
> operator).
>

Now this would really be cool - alas, my C++ knowledge comes short (I
have avoided streama as much as possible in the past bcos I never really
understood them). Can you recommend a good book? (or maybe provide
boiler plate code I could expand on)?

What I really want to do is this:

1). Write a serialize() function that will return a char* (a char array
or byte string), which contains the bits sequentially packed into a byte
string.
2). Write a deserialize() function that will accept a bytestring (char*)
of previously serialized bytes, and read bits sequentially (in reverse
order) and use that to populate the record

Since the size of the structure is fixed (I know how many bytes it would
take to hold the bits in the struct. Once I have serialized the bit
structure to a char array, I can use memcpy etc to copy the memory block
around etc.

> --
> James Kanze (GABI Software) email:james.kanze@gmail.com
> Conseils en informatique orientée objet/
> Beratung in objektorientierter Datenverarbeitung
> 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

(2b|!2b)==?

10/22/2008 10:34:00 AM

diamondback wrote:
> On Oct 21, 2:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
>> I have a struct declared as follows:
>>
>> struct RecordType1
>> {
>> unsigned int dt : 24; //3 bytes
>> unsigned int ts : 16; //2 bytes
>> unsigned int lsp : 24; //3 bytes (float value represented as int)
>> unsigned int lst : 16; //2 bytes
>> unsigned int lsv : 16; //2 bytes
>> unsigned int x1 : 24; //3 bytes (float value represented as int)
>> unsigned int x2 : 24; //3 bytes (float value represented as int)
>> unsigned int x3 : 24; //3 bytes (float value represented as int)
>> unsigned int x4 : 24; //3 bytes (float value represented as int)
>> unsigned int bv : 16; //2 bytes
>> unsigned int ak : 24; //3 bytes (float value represented as int)
>> unsigned int av : 16; //2 bytes
>> unsigned int cv : 24; //3 bytes
>>
>> };
>>
>> I need to serialize this struct by packing the bits into a contiguous
>> byte array, and then read it back from the byte array. I cant use
>> memcpy/sizeof because of boundary alignment ...
>>
>> I'd appreciate if anyone can show me how to do this. Ieally, I would
>> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
>
>
> First of all, there is no way to get around the endian-ness issue. Any
> client that reads this data needs to know what order the bytes are
> arriving in. There is simply no way around it. The bytes arrive
> serialized, "one-at-a-time" if you will.
>
> But, I'll get to that in a moment. A quick and dirty way of dealing
> with serialization is a trick with unions. So:
>
> union RecSerializer
> {
> RecordType1 record;
> unsigned char stream[sizeof(RecordType1)];
> };
>

This would have been an elegant solution, but unfortunately, it defeats
(to some extent), the purpose of the exercise - which is to reduce the
footprint of the data when stored in a database. For example, a 1 bit
field (declared as unsigned int bitflag:1) would still occupy one byte.
What I want to do, is to 'stack' the bits in the stucture, into a
contiguous byte array. So that the space (in bytes) occupied by the
serialized bit field structure is the "same" (i.e.up to the nearest
byte) as the byte array. Maybe I should have titled this post:
"Serializing/deserializing a bit field struct to/from a byte array
because that is in essence, what it is that I am really trying to do.

> Now, record and stream both occupy the same memory, so the data can be
> accessed via either member, depending on what you are doing. So, you
> load the memory using the structure (record):
>
> RecSerializer m_rs;
> m_rs.record.dt = 1;
> m_rs.record.ts = 2;
> m_rs.record.lsp = 3;
> ...
>
> Then you send it using the byte array (stream):
>
> <networkConnection>.send( m_rs.stream, sizeof(RecordType1) );
>
> Reading and de-serializing is simply a reverse of the sending process.
>
> However, this does not take into account cross platform endianess
> issues. Like I said above, this is the language barrier that confronts
> anyone who does cross-platform network communication. You must deal
> with it. Sorry. Luckily, you have some choices on how to do this:
>
For the sake of simplicity, I will relax the requirements of being
"endian agnostic". I will simply use network byte ordering. That will
cover the vast majority of platforms I anticpate running this on anyway.

> The easiest(?) way is to just insist that everyone play nice and use
> the same endianness. If you can accomplish this, please run for
> President. I will vote for you...twice. Otherwise, you need to agree
> to disagree and standardize on something. Luckily, the Internet
> protocols use big-endian byte order and the POSIX byte order
> functions htons, htonl, ntohs, and ntohl can be used for marshalling
> and demarshalling data. These are platform independent functions that
> reorder the bytes in standard data to conform to the Internet byte
> order and back. All clients on your network must agree to conform to
> the standard, obviously. However, these functions work on standard 2
> or 4 byte boundaries only. So, these will not work for you in your
> current design. My initial reaction, not knowing the details of your
> system, would be to question if you absolutely must use bit-fields in
> the structure? Processing would be easier, and potentially faster, if
> you stuck with standard byte boundaries. But, I will assume you have
> considered this and I will proceed under the assumption that the odd
> byte boundaries are required.
>
> A clever method of dealing with byte order could be to take a cue from
> Unicode encoded files and include a Byte Order Mark (BOM) as the first
> two bytes of the message. The BOM would have a value that could not be
> accidentally inverted. Something simple like 0xFFEE, for example,
> would work fine. With the BOM in place, you simply serialize and send
> the message, ignoring byte order. However, the receiving client would
> de-serialize and read the first two bytes. If the bytes are in the
> expected order (0xFFEE), the de-serialization can continue with no
> further processing. But, if the BOM is read backwards (0xEEFF), the
> client knows that the message was sent with a different endianness and
> must be further processed to extract the data.
>
> So, your options are:
> 1) Get everyone to agree on endianness (and bring world peace)
> 2) Change your data definition to facilitate the use of POSIX byte
> order conversion.
> 3) Use a "BOM" (or some other order marker) in your data definition.
>
> I hope that helps. If not, I hope someone else has a better idea.

(2b|!2b)==?

10/22/2008 10:56:00 AM

Thomas J. Gritzan wrote:
> (2b|!2b)==? schrieb:
>> I have a struct declared as follows:
>>
>> struct RecordType1
>> {
>> unsigned int dt : 24; //3 bytes
>> unsigned int ts : 16; //2 bytes
>> unsigned int lsp : 24; //3 bytes (float value represented as
>> int)
>> unsigned int lst : 16; //2 bytes
>> unsigned int lsv : 16; //2 bytes
>> unsigned int x1 : 24; //3 bytes (float value represented as int)
>> unsigned int x2 : 24; //3 bytes (float value represented as int)
>> unsigned int x3 : 24; //3 bytes (float value represented as int)
>> unsigned int x4 : 24; //3 bytes (float value represented as int)
>> unsigned int bv : 16; //2 bytes
>> unsigned int ak : 24; //3 bytes (float value represented as int)
>> unsigned int av : 16; //2 bytes
>> unsigned int cv : 24; //3 bytes
>> };
>>
>> I need to serialize this struct by packing the bits into a contiguous
>> byte array, and then read it back from the byte array. I cant use
>> memcpy/sizeof because of boundary alignment ...
>
> Huh?
>
>> I'd appreciate if anyone can show me how to do this. Ieally, I would
>> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
>
> In what endianness do you want to store it?
>
> Let's assume 8 bit bytes, unsigned int at least sizeof(3), and you want
> to output in network byte order (big endian).
>
> Here's a quick'n'dirty solution just to show you the main idea:
>
> // helper functions
> void put8(std::ostream& out, unsigned int val)
> {
> assert(val <= 0xFF);
> out.put(val);
> }
> void put16(std::ostream& out, unsigned int val)
> {
> assert(val <= 0xFFFF);
> put8(out, val >> 8);
> put8(out, val & 0xFF);
> }
> void put24(std::ostream& out, unsigned int val)
> {
> assert(val <= 0xFFFFFF);
> put8(out, val >> 16);
> put16(out, val & 0xFFFF);
> }
>
> // could be an operator<<, too.
> void serialize(std::ostream& out, const RecordType1& data)
> {
> put24(out, data.dt);
> put16(out, data.ts);
> put24(out, data.lsp);
> // and so on...
> }
>

This is (almost) *EXACTLY* what I want to do. Thank you, thank you,
thank you. I have relaxed my endianness requirements - and will now be
using network byte order - since this covers all the machines I envisage
running this on.

I do have a more complicated struct which packs several fields in 2
bytes (please see fields flag, xmo and dxp below):

struct RecordType5 : public DbRecord
{
unsigned int dt : 24;
unsigned int ts : 16;
unsigned int stl : 24;
unsigned int lsp : 24;
unsigned int lst : 16;
unsigned int lsv : 16;
unsigned int bd : 24;
unsigned int bv : 16;
unsigned int ak : 24;
unsigned int av : 16;
unsigned int cv : 24;
unsigned int lvl : 24;
unsigned int strk : 24;
unsigned int flag: 1;
unsigned int xmo : 4;
unsigned int dxp : 11;
unsigned int its : 24;
unsigned int tb : 24;
unsigned int ta : 24;
unsigned int dl : 24;
unsigned int gm : 24;
unsigned int vg : 24;
unsigned int ro : 24;
unsigned int iv : 24;
};

I suppose I will need additional helper functions put1(), put4() and
put11(). Since these functions "straddle" 2 bytes, I am not sure how to
implement them, but I'd like to use similar putXbits() helper functions
as they are very elegant, simple and "do what it says on the tin" .
Could you please show how put1(), put4() and put11() could be written?

> To read them back, you would read two or three bytes, left shift the
> high bytes and binary-OR them together.
>

> If you don't want to use ostream/istream, you would have to track the
> current position in the array (in the put functions). An output iterator
> might be an elegant solution.
>

Yes. I want to "stack the bits" (i.e. serialize the bit field struct) to
a char* (char array or byte string). Once I have the bits stacked.packed
into a byte array (since I know the number of bytes that have been used
up by the bits, it means I know the size of the memory block. Armed with
a memory block (char array) and its size, it means I can use memcpy,
memmov etc to my hearts content. Once I can do that, I can do the rest.

(I must admit that I dont know too much about C++ streams). Maybe there
is a way to direct bytes from an ostream to a char array? - or maybe its
better to serialize directly to a char array?

Thomas J. Gritzan

10/22/2008 4:08:00 PM

Nick Keighley schrieb:
> On 21 Oct, 10:18, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
>
>> I have a struct declared as follows:
>>
>> struct RecordType1
>> {
>> unsigned int dt : 24; //3 bytes
>> unsigned int ts : 16; //2 bytes
>> unsigned int lsp : 24; //3 bytes (float value represented as
>
> <snip>
>
> (all examples given are 16 or 24 bits wide)
>
>> };
>>
>> I need to serialize this struct by packing the bits into a contiguous
>> byte array, and then read it back from the byte array. I cant use
>> memcpy/sizeof because of boundary alignment ...
>>
>> I'd appreciate if anyone can show me how to do this. Ieally, I would
>> like to this in a cross platform (i.e. "ENDIAN-ness" agnostic) way.
>
> 1. you can't write this in an endian agnostic manner
> "but that's the worse thing that could possible happen!".
> As other have said, decide on an endianness and write platform
> specific code to read/write the data in its correct endianess.

What makes you think so?

You have to decide on an endianness your data is stored in, of course,
but you don't have to care for your platform's byte order at all.

unsigned int val = /* some 2-byte value */;

unsigned char high = (val >> 8);
unsigned char low = val & 0xFF;

The more significant bits are in high, the other in low. You can store
them in big-endian (high, low) or little-endian (low, high) order,
depending on what data format you decided on, but your platform only has
to have 8-bit-bytes.

> 2. bitfields are even less portable than the above implies.
> "but it's worse than that!"
[...]

That is another reason why you shouldn't store a memory dump of the
struct, but rather format the values in some specific format.

The in-memory representation of data structures is platform specific,
but you don't care, because the compiler handles them. You only have to
agree on an on-disk representation, so that another platform's programs
can use the data, and your programs are resistent against compiler changes.

--
Thomas

.rhavin grobert

10/22/2008 4:32:00 PM

On 22 Okt., 10:03, James Kanze <james.ka...@gmail.com> wrote:
> [...] since generally the compiler won't allocate a bit field
> in a way that would cross a 32 bit boundary. [...] But there's
> rarely any sense in having bit fields larger than 8 bits.)

shure? consider:

typedef unsigned __int64 QUAD;

#pragma pack (push, 1)
struct foo {
union {
QUAD qData;
struct {
QUAD nFirstNibble : 4;
QUAD nSecondNibble : 4;
QUAD nThirdNibble : 4;
QUAD nBloodyRest : 54;
};
};
};
#pragma pack (pop)

comp.lang.c++

Serializing bit field structures

(2b|!2b)==?

Thomas J. Gritzan

diamondback

James Kanze

James Kanze

Nick Keighley

(2b|!2b)==?

(2b|!2b)==?

(2b|!2b)==?

Thomas J. Gritzan

.rhavin grobert

x Login to ForumsZone