James Kanze
12/16/2008 11:00:00 PM
On Dec 16, 11:39 am, SG <s.gesem...@gmail.com> wrote:
> I'm wondering what the preferred portable way of handling
> binary data is.
It depends on the format, and just how portable you have to be.
And the data types; truely portable floating point can be a
bitch.
> For example, I want to read a binary file which contains 32bit
> and 16bit integers in the little endian format.
Two's complement, or?
> Now, I'm aware that a character might have more bits than 8.
> But I don't care about this case for now. So, I enclose my
> conversion routines for char* to some int with preprocessor
> directives:
> #include <climits>
> #if CHAR_BIT == 8
> // conversion code here
> #endif
More usual would be
#include <climits>
#if CHAR_BIT != 8
#error Only 8 bit char's supported
#endif
> As far as I know the C++ standard doesn't specify whether a
> char is signed or unsigned
No, and it varies in practice. (Not that I think it makes a
difference in your case.)
> nor does it specify what will happen if i convert between
> signed and unsigned in case the original value can't be
> represented.
Conversions to unsigned integral types are fully defined.
> Also, signed integers don't need to be stored in two's
> complement. Unfortunately, this seems to make decoding a 16
> bit signed number in two's complement & little endian byte
> order in a portable way impossible.
Not really. First, you do the input as unsigned:
uint16_t result = source.get() ;
result |= source.get() << 8 ;
(source.get() should return a value in the range 0-255.
std::istream.get() could be used here. The only time it's out
of range is if you read past EOF: the results are still well
defined in that case, even if they don't mean anything, there's
no undefined behavior; and you can test for the case
afterwards.)
For uint32_t, do the same thing with four bytes.
Unless I knew I had to support a machine where it didn't work,
I'd just assign the results to an int16_t and be done with it.
(I only know of two machines where it wouldn't work, and neither
has a 16 bit integral type to begin with.) Otherwise, you have
might have to do some juggling:
return result <= 0x7FFF
: static_cast< int16_t >( result )
? - static_cast< int16_t >( ~result ) - 1 ;
> I came up with the following piece of code which still invokes
> implementation defined behaviour:
> // decode signed 16 bit int (two's complement & little endian)
> inline int_fast16_t get_s16le(const char* p)
> {
> // we already know that CHAR_BIT == 8 but "char" might be signed
> // as well as unsigned
> unsigned char low = p[0]; // implementation-defined for p[0]<0
> signed char hi = p[1]; // implementation-defined for p[1]>=128
> return int_fast16_t(low) + int_fast16_t(hi) * 256;
> }
Don't use char (or char const*) here. Use unsigned char, or
unsigned char const*. Or just use the istream directly (opened
in binary mode, of course), using istream::get() (and thus
leaving the problem up to the implementation of filebuf/istream
to make this work.
(Actually, of course, *if* the char have the correct values,
there's no problem. The problem only occurs if char is signed,
AND the machine doesn't use 2's complement---there would be one
unsigned char value that couldn't occur. And there's so much
code out there which uses char* for pointing to raw memory that
any implementation which doesn't use 2's complement will almost
certainly make char unsigned.)
> Also, this is horrorbly slow.
Have you actually measured it. I've found no measurable
difference using the shifting technique, above.
> I'd much rather be able to query certain implementation
> properties so I can use much faster code.
> My latest incarnation looks like this:
> inline uint_fast16_t swap_bytes_16bit(uint_fast16_t x) {
> return ((x & 0xFF00u) >> 8) | ((x & 0x00FFu) << 8);
> }
> inline uint_fast16_t get_u16le(const char* p) {
> uint_fast16_t x;
> assert(sizeof(x)>=2);
> std::memcpy(&x,p,2);
> #if BYTE_ORDER == LITTLE_ENDIAN
> return x;
> #else
> return swap_bytes_16bit(x);
> #endif
> }
Such swapping is likely to be slower than just doing it right in
the first place, using the shifts immediately on reading.
> inline int_least16_t get_s16le(const char * p) {
> assert( signed(~0u) == -1 ); //< This is not guaranteed by the
> stamdard
> return get_u16le(p);
> }
> What's the preferred way to do this in a reasonably portable
> way?
See above. Most people, I suspect, count on the conversion of
the uint16_t to int16_t to do the right thing, although
formally, it's implementation defined (and may result in a
signal).
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34