James Kanze
10/22/2008 8:33:00 AM
On Oct 21, 7:04 pm, diamondback <christopher....@gmail.com> wrote:
> On Oct 21, 2:18 am, "(2b|!2b)==?" <void-s...@ursa-major.com> wrote:
> > I have a struct declared as follows:
> > struct RecordType1
> > {
> > unsigned int dt : 24; //3 bytes
> > unsigned int ts : 16; //2 bytes
> > unsigned int lsp : 24; //3 bytes (float value represented as int)
> > unsigned int lst : 16; //2 bytes
> > unsigned int lsv : 16; //2 bytes
> > unsigned int x1 : 24; //3 bytes (float value represented as int)
> > unsigned int x2 : 24; //3 bytes (float value represented as int)
> > unsigned int x3 : 24; //3 bytes (float value represented as int)
> > unsigned int x4 : 24; //3 bytes (float value represented as int)
> > unsigned int bv : 16; //2 bytes
> > unsigned int ak : 24; //3 bytes (float value represented as int)
> > unsigned int av : 16; //2 bytes
> > unsigned int cv : 24; //3 bytes
> > };
> > I need to serialize this struct by packing the bits into a
> > contiguous byte array, and then read it back from the byte
> > array. I cant use memcpy/sizeof because of boundary
> > alignment ...
> > I'd appreciate if anyone can show me how to do this. Ieally,
> > I would like to this in a cross platform (i.e. "ENDIAN-ness"
> > agnostic) way.
> First of all, there is no way to get around the endian-ness
> issue. Any client that reads this data needs to know what
> order the bytes are arriving in. There is simply no way around
> it. The bytes arrive serialized, "one-at-a-time" if you will.
More generally, he really has to define a serialization format,
period. Of course, for unsigned, endianness is about the only
issue. And he's done part of the work already, since he's
defined how to represent floats, except for the endianness.
> But, I'll get to that in a moment. A quick and dirty way of dealing
> with serialization is a trick with unions. So:
> union RecSerializer
> {
> RecordType1 record;
> unsigned char stream[sizeof(RecordType1)];
> };
> Now, record and stream both occupy the same memory, so the
> data can be accessed via either member, depending on what you
> are doing.
Read access can only access the last member written; otherwise,
you have undefined behavior. Formally, a compiler is allowed to
arrange for some sort of secondary store to remember the last
field written, and check it when reading. I think that there
was once a compiler which did this, but it's certainly not
frequent. And of course, reading a record when you stored
random data through stream could result in a core dump or the
equivalent on some architectures (Unisys MCP, for example).
> So, you load the memory using the structure (record):
> RecSerializer m_rs;
> m_rs.record.dt = 1;
> m_rs.record.ts = 2;
> m_rs.record.lsp = 3;
> ...
> Then you send it using the byte array (stream):
> <networkConnection>.send( m_rs.stream, sizeof(RecordType1) );
> Reading and de-serializing is simply a reverse of the sending
> process.
All of which is undefined behavior, and can in practice generate
a core dump on some less common architectures.
> However, this does not take into account cross platform
> endianess issues. Like I said above, this is the language
> barrier that confronts anyone who does cross-platform network
> communication. You must deal with it. Sorry. Luckily, you have
> some choices on how to do this:
> The easiest(?) way is to just insist that everyone play nice
> and use the same endianness. If you can accomplish this,
> please run for President. I will vote for you...twice.
> Otherwise, you need to agree to disagree and standardize on
> something. Luckily, the Internet protocols use big-endian byte
> order and the POSIX byte order functions htons, htonl, ntohs,
> and ntohl can be used for marshalling and demarshalling data.
> These are platform independent functions[...]
They're not portable, and they aren't really meaningful for some
(many) platforms, since they consider that there can only be two
possible byte orders (there are 24 possible orderings for 4
bytes, and I've seen at least three in actual practice), and
they ignore all other representation issues (and possibly
alignment issues).
Repeat after me: endianness is just the tip of the iceberg. The
htonxxx and ntohxxx functions are just hacks, designed as a
quick work-around in order to communicate between two fixed
architectures, and are not generally useful (except perhaps when
addressing the system API---a system dependent context).
Given his description of the floating point format in another
thread, I would imagine something like:
oxxxstream&
oxxxstream::operator<<(
float value )
{
assert( value >= 0.0 && value < 8 ) ;
int exp ;
int mant
= frexp( value, &exp ) * (1 << 21) ;
std::streambuf* sb = rdbuf() ;
sb->sputc( (exp << 5) | (mant >> 16) ) ;
sb->sputc( (mant >> 8) & 0xFF ) ;
sb->sputc( mant & 0xFF ) ;
}
(This code lacks any error handling; you need to verify the
return value of sb->sputc, and set badbit in the stream if it is
EOF. And not do any further output if the stream has failed. I
generally use a special class for this, which maintains a
reference to the stream and the pointer to the streambuf, and
has a single put function:
void
GuardedOutput::put( unsigned char ch )
{
if ( myStream && myStreambuf->sputc( ch ) == EOF ) {
myStream.setstate( std::ios::badbit ) ;
}
}
Also, I normally avoid bitwise operators on signed types. In
this case, however, the types are partially conditioned by the
signature of frexp, and the precondition checks guarantees that
I'll never get a negative value with the operations I do, so the
signed int behaves exactly like an unsigned int.)
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34