[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c

Union test for endianess

Bhasker Penta

6/17/2011 3:17:00 AM

One way to test for endianess is to use a union:

void endianTest()
{
union // sizeof(int) == 4
{
int i;
char ch[4];
} U;

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char member
puts("\nLittle endian");
else
puts("\nBig endian");
}

Writing to one member of a union and reading from another member is
implementation defined(K & R). This example is used for testing
endianess @ c-faq.com. I know that gcc allows this. Is the above
snippet to test for endianess legal C or C++?
52 Answers

ram

6/17/2011 3:31:00 AM

0

Bhasker Penta <bskdsp@gmail.com> writes:
>U.i=0x12345678; // writing to int member
>if ( U.ch[0]==0x78 ) // reading from char member
>Writing to one member of a union and reading from another member is
>implementation defined(K & R).

»When a value is stored in a member of an object of union type,
the bytes of the object representation that do not
correspond to that member but do correspond to other
members take unspecified values«
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
ISO/IEC 9899:1999 (E), 6.2.6.1#7

One also might cast a pointer to int into a pointer to char[],
but I assume, dereferencing this might also give unspecified
values in the best case or might result in undefined behavior
in the worst case?

Endianess is an implementation detail of a higher
programming language that the language wants to hide from
you (information hiding), because usually one does not need
to know it. One even can serialize and deserialize in either
a portable or an implementation specific manner without
knowing this.

However, each specific C implementation is free to disclose
this implementation detail in its documentation.

For such purposes, it might be nice, if standard C would
define names for all the properties an autoconf script
usually determines, so that each C implementation could
predefine them.

Ian Collins

6/17/2011 3:37:00 AM

0

On 06/17/11 03:31 PM, Stefan Ram wrote:
> Bhasker Penta<bskdsp@gmail.com> writes:
>> U.i=0x12345678; // writing to int member
>> if ( U.ch[0]==0x78 ) // reading from char member
>> Writing to one member of a union and reading from another member is
>> implementation defined(K& R).
>
> »When a value is stored in a member of an object of union type,
> the bytes of the object representation that do not
> correspond to that member but do correspond to other
> members take unspecified values«
> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
> ISO/IEC 9899:1999 (E), 6.2.6.1#7
>
How is that relevant to the question, which assumes sizeof(int) == 4?

> One also might cast a pointer to int into a pointer to char[],
> but I assume, dereferencing this might also give unspecified
> values in the best case or might result in undefined behavior
> in the worst case?

Do what? How is that relevant to, well anything?

> Endianess is an implementation detail of a higher
> programming language that the language wants to hide from
> you (information hiding), because usually one does not need
> to know it. One even can serialize and deserialize in either
> a portable or an implementation specific manner without
> knowing this.

Who ever writes the serialisation code does need to know. If you need
to know the endianess, you are probably writing serialisation code!

--
Ian Collins

China Blue Veins

6/17/2011 3:55:00 AM

0

In article <union-20110617051812@ram.dialup.fu-berlin.de>,
ram@zedat.fu-berlin.de (Stefan Ram) wrote:

> Bhasker Penta <bskdsp@gmail.com> writes:
> >U.i=0x12345678; // writing to int member
> >if ( U.ch[0]==0x78 ) // reading from char member
> >Writing to one member of a union and reading from another member is
> >implementation defined(K & R).
>
> »When a value is stored in a member of an object of union type,
> the bytes of the object representation that do not
> correspond to that member but do correspond to other
> members take unspecified values«
> ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

All the bytes of i correspond to bytes of ch, and all the bytes of ch correspond
to bytes of i.

union // sizeof(int) == 4
{
int i;
char ch[4];
} U;

--
I remember finding out about you, | I survived XYZZY-Day.
Everyday my mind is all around you,| I'm whoever you want me to be.
Looking out from my lonely room |Annoying Usenet one post at a time.
Day after day. | At least I can stay in character.

China Blue Veins

6/17/2011 4:01:00 AM

0

In article <f19541de-213c-4744-b712-92f20a3ad294@d26g2000prn.googlegroups.com>,
Bhasker Penta <bskdsp@gmail.com> wrote:

> One way to test for endianess is to use a union:
>
> void endianTest()
> {
> union // sizeof(int) == 4
> {
> int i;
> char ch[4];
> } U;
>
> U.i=0x12345678; // writing to int member
> if ( U.ch[0]==0x78 ) // reading from char member
> puts("\nLittle endian");
> else
> puts("\nBig endian");
> }
>
> Writing to one member of a union and reading from another member is
> implementation defined(K & R). This example is used for testing
> endianess @ c-faq.com. I know that gcc allows this. Is the above
> snippet to test for endianess legal C or C++?

You can use casts to avoid variables.

#define X (((union{int i; char ch[4];}){.i=0x12345678}).ch[0])
#define littleEndian (X==0x78)
#define bigEndian (X==0x01)

--
I remember finding out about you, | I survived XYZZY-Day.
Everyday my mind is all around you,| I'm whoever you want me to be.
Looking out from my lonely room |Annoying Usenet one post at a time.
Day after day. | At least I can stay in character.

Shao Miller

6/17/2011 5:50:00 AM

0

On 6/16/2011 10:16 PM, Bhasker Penta wrote:
> One way to test for endianess is to use a union:
>
> void endianTest()
> {
> union // sizeof(int) == 4
> {
> int i;
> char ch[4];
> } U;
>
> U.i=0x12345678; // writing to int member
> if ( U.ch[0]==0x78 ) // reading from char member
> puts("\nLittle endian");
> else
> puts("\nBig endian");
> }
>

Please note that

One possible way to help to ensure that 'sizeof (int) == 4' and that you
have 8-bit bytes is to:

#define TT_ASSERT(message, test) \
typedef char (message)[(test) ? 1 : -1]

TT_ASSERT(INT_IS_NOT_4_BYTES, sizeof (int) == 4);
TT_ASSERT(NOT_8_BIT_BYTE, CHAR_BIT == 8);

> Writing to one member of a union and reading from another member is
> implementation defined(K& R).

As far as I know, if 'sizeof (int) == 4' as shown, you can certainly
read from each element of the 'U.ch' array. C doesn't guarantee that
'sizeof (int) == 4', of course.

Combined with the 'TT_ASSERT's above, you could have your union as:

union {
unsigned int i;
unsigned char ch[sizeof (unsigned int)];
} U;

(Note that the use of 'unsigned' attempts to avoid any potential sign
bit complications; the 'TT_ASSERT' might be better off matching, too.)

> This example is used for testing
> endianess @ c-faq.com. I know that gcc allows this. Is the above
> snippet to test for endianess legal C or C++?

If you know that the implementation definitely uses an 8-bit byte, a
4-byte 'int', and that there are no padding bits and that '0x12345678'
is within the range of values for 'int', then I'd say yes for "legal C". :)

Bhasker Penta

6/17/2011 6:18:00 AM

0

On Jun 17, 10:50 am, Shao Miller <sha0.mil...@gmail.com> wrote:
> On 6/16/2011 10:16 PM, Bhasker Penta wrote:
>
>
>
>
>
> > One way to test for endianess is to use a union:
>
> > void endianTest()
> > {
> >      union     // sizeof(int) == 4
> >      {
> >          int i;
> >          char ch[4];
> >      } U;
>
> >      U.i=0x12345678; // writing to int member
> >      if ( U.ch[0]==0x78 )   // reading from char member
> >          puts("\nLittle endian");
> >      else
> >          puts("\nBig endian");
> > }
>
> Please note that
>
> One possible way to help to ensure that 'sizeof (int) == 4' and that you
> have 8-bit bytes is to:
>
>    #define TT_ASSERT(message, test) \
>      typedef char (message)[(test) ? 1 : -1]
>
>    TT_ASSERT(INT_IS_NOT_4_BYTES, sizeof (int) == 4);
>    TT_ASSERT(NOT_8_BIT_BYTE, CHAR_BIT == 8);
>
> > Writing to one member of a union and reading from another member is
> > implementation defined(K&  R).
>
> As far as I know, if 'sizeof (int) == 4' as shown, you can certainly
> read from each element of the 'U.ch' array.  C doesn't guarantee that
> 'sizeof (int) == 4', of course.
>
> Combined with the 'TT_ASSERT's above, you could have your union as:
>
>    union {
>        unsigned int i;
>        unsigned char ch[sizeof (unsigned int)];
>      } U;
>
> (Note that the use of 'unsigned' attempts to avoid any potential sign
> bit complications; the 'TT_ASSERT' might be better off matching, too.)
>
> > This example is used for testing
> > endianess @ c-faq.com. I know that gcc allows this. Is the above
> > snippet to test for endianess legal C or C++?
>
> If you know that the implementation definitely uses an 8-bit byte, a
> 4-byte 'int', and that there are no padding bits and that '0x12345678'
> is within the range of values for 'int', then I'd say yes for "legal C". :)

> If you know that the implementation definitely uses an 8-bit byte, a
> 4-byte 'int', and that there are no padding bits and that '0x12345678'
> is within the range of values for 'int', then I'd say yes for "legal C". :)

At least on my machine (Windows 7 64 bit) sizeof(int)==4,
sizeof(char)==1 and '0x12345678' is within 'int' limit. But the fact
is we are writing to int member and reading from (different) char
member. That doesn't go well with union rules. If it is legal in C
language to reinterpret the content of any object as a char array (or
char pointer), then I believe above snippet is technically correct C
code(I may be wrong).
Eg.
int i=0x12345678; // sizeof(int) == 4
char *p=(char *)&i;
if(*p==0x78) // reinterpreting int i through a char
pointer
puts("Little Endian");
else
puts("Big Endian");

Shao Miller

6/17/2011 8:01:00 AM

0

On 6/17/2011 1:17 AM, Bhasker Penta wrote:
> On Jun 17, 10:50 am, Shao Miller<sha0.mil...@gmail.com> wrote:
>> On 6/16/2011 10:16 PM, Bhasker Penta wrote:
>>
>>
>>
>>
>>
>>> One way to test for endianess is to use a union:
>>
>>> void endianTest()
>>> {
>>> union // sizeof(int) == 4
>>> {
>>> int i;
>>> char ch[4];
>>> } U;
>>
>>> U.i=0x12345678; // writing to int member
>>> if ( U.ch[0]==0x78 ) // reading from char member
>>> puts("\nLittle endian");
>>> else
>>> puts("\nBig endian");
>>> }
>>
>> Please note that
>>
>> One possible way to help to ensure that 'sizeof (int) == 4' and that you
>> have 8-bit bytes is to:
>>
>> #define TT_ASSERT(message, test) \
>> typedef char (message)[(test) ? 1 : -1]
>>
>> TT_ASSERT(INT_IS_NOT_4_BYTES, sizeof (int) == 4);
>> TT_ASSERT(NOT_8_BIT_BYTE, CHAR_BIT == 8);
>>
>>> Writing to one member of a union and reading from another member is
>>> implementation defined(K& R).
>>
>> As far as I know, if 'sizeof (int) == 4' as shown, you can certainly
>> read from each element of the 'U.ch' array. C doesn't guarantee that
>> 'sizeof (int) == 4', of course.
>>
>> Combined with the 'TT_ASSERT's above, you could have your union as:
>>
>> union {
>> unsigned int i;
>> unsigned char ch[sizeof (unsigned int)];
>> } U;
>>
>> (Note that the use of 'unsigned' attempts to avoid any potential sign
>> bit complications; the 'TT_ASSERT' might be better off matching, too.)
>>
>>> This example is used for testing
>>> endianess @ c-faq.com. I know that gcc allows this. Is the above
>>> snippet to test for endianess legal C or C++?
>>
>> If you know that the implementation definitely uses an 8-bit byte, a
>> 4-byte 'int', and that there are no padding bits and that '0x12345678'
>> is within the range of values for 'int', then I'd say yes for "legal C". :)
>
>> If you know that the implementation definitely uses an 8-bit byte, a
>> 4-byte 'int', and that there are no padding bits and that '0x12345678'
>> is within the range of values for 'int', then I'd say yes for "legal C". :)
>
> At least on my machine (Windows 7 64 bit) sizeof(int)==4,
> sizeof(char)==1 and '0x12345678' is within 'int' limit. But the fact
> is we are writing to int member and reading from (different) char
> member. That doesn't go well with union rules.

I believe it's quite all right. 6.5.2.3p3 has:

"A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is that of
the named member, and is an lvalue if the first expression is an lvalue.
If the first expression has qualified type, the result has the
so-qualified version of the type of the designated member."

Since you are using your 'ch' array, its element type is a character
type, and there are no trap representations for character types. The
last-stored value for the union has an object representation[6.2.6.1p4]
and that representation is then used for 'ch'.

Which union rules are you worried about, in particular?

> If it is legal in C
> language to reinterpret the content of any object as a char array (or
> char pointer), then I believe above snippet is technically correct C
> code(I may be wrong).

"char array": Yes. "char pointer": I think you mean if it's accessed
via a pointer to a character type. Yes, that's quite often the case.

One of the guarantees of the character types is that all objects can
have all of their bits manipulated/inspected via access through a
character type. This is useful for copying, for example. Scalar types
other than character types might have trap representations, if I recall
correctly.

Another nice thing about character types is that they have the weakest
alignment requirement; a pointer to a character type can be cast from
any other pointer-to-object-type because the alignment is fine[6.3.2.3p7].

> Eg.
> int i=0x12345678; // sizeof(int) == 4
> char *p=(char *)&i;
> if(*p==0x78) // reinterpreting int i through a char
> pointer
> puts("Little Endian");
> else
> puts("Big Endian");
>

Absolutely as legitimate as your previous code. :)

(Using 'unsigned' variants are "nicer," in my opinion; no sign bit.)

Ben Bacarisse

6/17/2011 11:10:00 AM

0

China Blue Angels <chine.bleu@yahoo.com> writes:

> In article <f19541de-213c-4744-b712-92f20a3ad294@d26g2000prn.googlegroups.com>,
> Bhasker Penta <bskdsp@gmail.com> wrote:
>> One way to test for endianess is to use a union:
>>
>> void endianTest()
>> {
>> union // sizeof(int) == 4
>> {
>> int i;
>> char ch[4];
>> } U;
>>
>> U.i=0x12345678; // writing to int member
>> if ( U.ch[0]==0x78 ) // reading from char member
>> puts("\nLittle endian");
>> else
>> puts("\nBig endian");
>> }
<snip>

> You can use casts to avoid variables.
>
> #define X (((union{int i; char ch[4];}){.i=0x12345678}).ch[0])
> #define littleEndian (X==0x78)
> #define bigEndian (X==0x01)

It's probably worth pointing out that this is (a) C99 and (b) has no
casts!

Since all that's needed is a test for two of the possible byte orders,
I'd avoid using a value that might not be a valid int:

#define X (((union{int i; char ch[sizeof(int)];}){.i=1}).ch[0])

The tests then become X and !X (so I'd use some other name).

--
Ben.

ram

6/17/2011 12:57:00 PM

0

Shao Miller <sha0.miller@gmail.com> writes:
>Which union rules are you worried about, in particular?

One might worry about not knowing whether or where C actually
specifies the value of a certain member. For example, in

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char member

or

int i=0x12345678; // sizeof(int) == 4
char *ch=(char *)&i;

, we assume that the value *ch is a »window« into the
in-memory representation of i. but does the C standard
actually requires an implementation to behave this way
somewhere? If so, where?

>One of the guarantees of the character types is that all objects can
>have all of their bits manipulated/inspected via access through a
>character type.

Yes, it would be nice to know, where one can find this.
In the best case, all the steps needed to prove that *ch
really has the semantics as intended above.

James Kuyper

6/17/2011 1:32:00 PM

0

On 06/16/2011 11:16 PM, Bhasker Penta wrote:
> One way to test for endianess is to use a union:
>
> void endianTest()
> {
> union // sizeof(int) == 4
> {
> int i;
> char ch[4];
> } U;
>
> U.i=0x12345678; // writing to int member
> if ( U.ch[0]==0x78 ) // reading from char member
> puts("\nLittle endian");
> else
> puts("\nBig endian");
> }

There's a total of 24 possible byte orders for 4-byte integers, and a
few of the other 22 orders have in fact been used. The other 22 orders
are generically referred to as "middle-endian", and 5 of them would have
a value of 0x78 in ch[0]. I once found a web page listing the byte
orders that had actually been used, and citing specific machines on
which they had been used - unfortunately, I didn't save it, and have
been unable to locate it again. Big-endian and little-endian were
overwhelmingly the most common orders, but the two orders that were next
most common would set ch[] to {0x34, 0x12, 0x78, 0x56} or {0x56, 0x78,
0x34, 0x12}. One of those two orders (I'm not sure which) was the one
used on the PDP-11 where I did my first C programming. There were
several other orders also in actual use, though far less commonly even
then those two.

> Writing to one member of a union and reading from another member is
> implementation defined(K & R). This example is used for testing
> endianess @ c-faq.com. I know that gcc allows this. Is the above
> snippet to test for endianess legal C or C++?

Neither C nor C++ use the term legal. It contains no syntax errors, it
has no a constraint violations, no diagnostics are required, and the
behavior is not undefined, according to the rules of either language. In
C++ it qualifies as "well-formed code". The closest comparable term in C
is "strictly conforming", but it doesn't qualify for that: it produces
different results on different platforms, which is the whole point of
this particular program, but such platform dependence is prohibited for
strictly conforming programs.
--
James Kuyper