Asp Forum - ~0 undefined?

blargg.h4g

10/20/2008 5:52:00 PM

Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
suggest so:

> If during the evaluation of an expression, the result is not
> mathematically defined or not in the range of representable values
> for its type, the behavior is undefined [...]

The description of unary ~ (C++03 section 5.3.1 paragraph 8):

> The operand of ? shall have integral or enumeration type; the
> result is the one's complement of its operand. Integral promotions
> are performed. The type of the result is the type of the promoted
> operand. [...]

But perhaps "one's complement" means the value that type would have with
all bits inverted, rather than the mathematical result of inverting all
bits in the binary representation. For example, on a machine with 32-bit
int, does one's complement of 0 (attempt to) have the value 2^31-1, which
can't be represented in a signed int and is thus undefined, or does it
have the value of whatever a signed int with all set bits would have (-1
on a two's complement machine)?

I used the ~0 case for simplicity; in practice, this issue might occur
when ANDing with the complement of a mask, for example n&=~0x0F to clear
the low 4 bits of n, or ~n&0x0F to find the inverted low 4 bits of n.

24 Answers

Victor Bazarov

10/20/2008 6:38:00 PM

blargg wrote:
> Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
> suggest so:
>
>> If during the evaluation of an expression, the result is not
>> mathematically defined or not in the range of representable values
>> for its type, the behavior is undefined [...]
>
> The description of unary ~ (C++03 section 5.3.1 paragraph 8):
>
>> The operand of ? shall have integral or enumeration type; the
>> result is the one's complement of its operand. Integral promotions
>> are performed. The type of the result is the type of the promoted
>> operand. [...]
>
> But perhaps "one's complement" means the value that type would have with
> all bits inverted, rather than the mathematical result of inverting all
> bits in the binary representation. For example, on a machine with 32-bit
> int, does one's complement of 0 (attempt to) have the value 2^31-1, which
> can't be represented in a signed int and is thus undefined,

Uh... Sorry, could you perhaps elaborate, why (2^31 - 1) can't be
represented? Or did you mean (2^32 - 1)?

If the resulting value is greater than can be represented in 'int', the
compiler will create the code to promote it first to 'unsigned', then to
'long', then to 'unsigned long', IIRC. So, if ~0 cannot for some reason
be represented in an int, it might become the (unsigned){all bits set}
value.

> or does it
> have the value of whatever a signed int with all set bits would have (-1
> on a two's complement machine)?

That's what I'd expect.

> I used the ~0 case for simplicity; in practice, this issue might occur
> when ANDing with the complement of a mask, for example n&=~0x0F to clear
> the low 4 bits of n, or ~n&0x0F to find the inverted low 4 bits of n.

Actually, on 2's complement, we use -1 for the "all bits set"...
Perhaps we should switch to ~0 (more portable?)

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

James Kanze

10/20/2008 7:29:00 PM

On Oct 20, 7:51 pm, blargg....@gishpuppy.com (blargg) wrote:
> Does ~0 yield undefined behavior?

No.

> C++03 section 5 paragraph 5 seems to suggest so:

> > If during the evaluation of an expression, the result is not
> > mathematically defined or not in the range of representable
> > values for its type, the behavior is undefined [...]

> The description of unary ~ (C++03 section 5.3.1 paragraph 8):

> > The operand of — shall have integral or enumeration type;
> > the result is the one's complement of its operand. Integral
> > promotions are performed. The type of the result is the type
> > of the promoted operand. [...]

> But perhaps "one's complement" means the value that type would
> have with all bits inverted, rather than the mathematical
> result of inverting all bits in the binary representation.

It's not really that clear what to expect on a machine not using
2's complement, but at the worst, it's unspecified or
implementation defined---not undefined behavior. (In general, I
would recommend avoiding ~, | and & on signed types.)

> For example, on a machine with 32-bit int, does one's
> complement of 0 (attempt to) have the value 2^31-1, which
> can't be represented in a signed int and is thus undefined, or
> does it have the value of whatever a signed int with all set
> bits would have (-1 on a two's complement machine)?

The wording is a bit sloppy, but what it doubtlessly means is
that you get a value with all bits set to one (in the specified
type). What that value is, of course, is probably
implementation dependent; it is -1 on a 2's complement machine,
but could very easily be 0 elsewhere.

> I used the ~0 case for simplicity; in practice, this issue
> might occur when ANDing with the complement of a mask, for
> example n&=~0x0F to clear the low 4 bits of n, or ~n&0x0F to
> find the inverted low 4 bits of n.

As long as the sign bit is 0, the behavior should be well
defined, with no ambiguities.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

blargg.h4g

10/20/2008 7:29:00 PM

In article <gdij68$rha$1@news.datemas.de>, Victor Bazarov
<v.Abazarov@comAcast.net> wrote:

> blargg wrote:
> > Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
> > suggest so:
> >
> >> If during the evaluation of an expression, the result is not
> >> mathematically defined or not in the range of representable values
> >> for its type, the behavior is undefined [...]
> >
> > The description of unary ~ (C++03 section 5.3.1 paragraph 8):
> >
> >> The operand of ? shall have integral or enumeration type; the
> >> result is the one's complement of its operand. Integral promotions
> >> are performed. The type of the result is the type of the promoted
> >> operand. [...]
> >
> > But perhaps "one's complement" means the value that type would have with
> > all bits inverted, rather than the mathematical result of inverting all
> > bits in the binary representation. For example, on a machine with 32-bit
> > int, does one's complement of 0 (attempt to) have the value 2^31-1, which
> > can't be represented in a signed int and is thus undefined,
>
> Uh... Sorry, could you perhaps elaborate, why (2^31 - 1) can't be
> represented? Or did you mean (2^32 - 1)?

Yeah, (2^32 - 1); I noticed just after I posted.

> If the resulting value is greater than can be represented in 'int', the
> compiler will create the code to promote it first to 'unsigned', then to
> 'long', then to 'unsigned long', IIRC. So, if ~0 cannot for some reason
> be represented in an int, it might become the (unsigned){all bits set}
> value.

Not as I understand it, where this only occurs when selecting what type a
literal will be. If what you described were the case, the type of an
expression would depend on its run-time value, for example if i were an
int, the type of the expression i+1 would be an int unless i contained
INT_MAX, where it would be of type unsigned int. This is clearly not the
case, since C++ is statically-typed.

> > or does it
> > have the value of whatever a signed int with all set bits would have (-1
> > on a two's complement machine)?
>
> That's what I'd expect.

The problem is that most compilers implement conversion of a value to a
signed int as a no-op, that is, simply to reinterpret the bits as being in
two's complement.

> > I used the ~0 case for simplicity; in practice, this issue might occur
> > when ANDing with the complement of a mask, for example n&=~0x0F to clear
> > the low 4 bits of n, or ~n&0x0F to find the inverted low 4 bits of n.
>
> Actually, on 2's complement, we use -1 for the "all bits set"...
> Perhaps we should switch to ~0 (more portable?)

It seems to me that ~0 is actually less-portable. As far as I know,
converting -1 to an unsigned type is guaranteed to give a value with all
bits set, since that conversion is guaranteed to give you a two's
complement representation in the unsigned result, even if the machine
doesn't use such a representation (C++03 section 4.7 paragraph 2).

Juha Nieminen

10/20/2008 7:38:00 PM

Victor Bazarov wrote:
> Actually, on 2's complement, we use -1 for the "all bits set"... Perhaps
> we should switch to ~0 (more portable?)

But will it work properly? Assume that in some system 'long' is a
larger type than 'int'. Will this work?

long value1 = ~0;
unsigned long value2 = ~0;

What kind of promotion chain is ~0 subjected to here? Will 'value1'
and 'value2' end up having all bits set?

Victor Bazarov

10/20/2008 7:43:00 PM

James Kanze wrote:
> [..]
> The wording is a bit sloppy, but what it doubtlessly means is
> that you get a value with all bits set to one (in the specified
> type). What that value is, of course, is probably
> implementation dependent; it is -1 on a 2's complement machine,
> but could very easily be 0 elsewhere.

Where? C++ only supports three representations, the 1's complement, the
2's complement, and the signed magnitude.

> [..]

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Victor Bazarov

10/20/2008 7:45:00 PM

Juha Nieminen wrote:
> Victor Bazarov wrote:
>> Actually, on 2's complement, we use -1 for the "all bits set"... Perhaps
>> we should switch to ~0 (more portable?)
>
> But will it work properly? Assume that in some system 'long' is a
> larger type than 'int'. Will this work?
>
> long value1 = ~0;
> unsigned long value2 = ~0;
>
> What kind of promotion chain is ~0 subjected to here? Will 'value1'
> and 'value2' end up having all bits set?

Probably not. It is generally better to use the literals of the same
type, IOW

long value1 = ~0L;
unsigned long value2 = ~0UL;

, to avoid specifically the situations where the result depends on some
implementation-defined behaviour[s].

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

James Kanze

10/20/2008 7:47:00 PM

On Oct 20, 8:38 pm, Victor Bazarov <v.Abaza...@comAcast.net> wrote:
> blargg wrote:
> > Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
> > suggest so:

> >> If during the evaluation of an expression, the result is not
> >> mathematically defined or not in the range of representable values
> >> for its type, the behavior is undefined [...]

> > The description of unary ~ (C++03 section 5.3.1 paragraph 8):

> >> The operand of — shall have integral or enumeration type; the
> >> result is the one's complement of its operand. Integral promotions
> >> are performed. The type of the result is the type of the promoted
> >> operand. [...]

> > But perhaps "one's complement" means the value that type would have with
> > all bits inverted, rather than the mathematical result of inverting all
> > bits in the binary representation. For example, on a machine with 32-bit
> > int, does one's complement of 0 (attempt to) have the value 2^31-1, which
> > can't be represented in a signed int and is thus undefined,

> Uh... Sorry, could you perhaps elaborate, why (2^31 - 1) can't be
> represented? Or did you mean (2^32 - 1)?

> If the resulting value is greater than can be represented in
> 'int', the compiler will create the code to promote it first
> to 'unsigned', then to 'long', then to 'unsigned long', IIRC.
> So, if ~0 cannot for some reason be represented in an int, it
> might become the (unsigned){all bits set} value.

No. That's the way the compiler behaves for integral literal
for an octal or hexadecimal constant. (For a decimal constant,
the results will never be unsigned.) In this case, the integral
literal is 0---which can't possibly overflow anything, and so
has type int. What we have here is an expression, with an
operator applied to an int. What blargg is doubtlessly
referring to is the statement in §5 that "If during the
evaluation of an expression, the result is not mathematically
defined or not in the range of representable values for its
type, the behavior is undefined, unless such an expression
appears where an integral constant expression is required
(5.19), in which case the program is ill-formed."

The problem here is that the "one's complement" operation
doesn't really define a numeric result, but rather a
manipulation on the underlying representation. So I don't think
that this statement can be applied: the ~ operator changes the
bits in the representation, and the "result" is whatever value
the changed bits happen to represent. Except that it's not
really too clear what that means, either; what happens if the
changed bits would be a trapping representation? (E.g. a 1's
complement machine that traps on negative 0's.)

Because of such issues, I tend to avoid using ~, | or & on
signed integral types.

> > or does it

> > have the value of whatever a signed int with all set bits
> > would have (-1 on a two's complement machine)?

> That's what I'd expect.

That's doubtlessly what was intended. On a two's complement
machine. Now try it on a one's complement machine which traps
negative 0's.

The C standard has cleared this up considerably. According to
the C99 standard:

If the implementation supports negative zeros, they
shall be generated only by:
-- the &, |, ^, ~, <<, and >> operators with arguments
that produce such a value;
-- the +, -, *, /, and % operators where one argument
is a negative zero and the result is zero;
-- compound assignment operators based on the above
cases.
It is unspecified whether these cases actually generate
a negative zero or a normal zero, and whether a negative
zero becomes a normal zero when stored in an object.

If the implementation does not support negative zeros,
the behavior of the &, |, ^, ~, <<, and >> operators
with arguments that would produce such a value is
undefined.

The second paragraph above is particularly significant: ~0
*is* undefined behavior on an implementation which doesn't
support negative zeros. (Note that the text immediately
preceding the above makes it clear that it is talking about
negative zero representations in one's complement or signed
magnitude; the "doesn't support negative zeros" only applies
in the case where they exist in the representation.)

> > I used the ~0 case for simplicity; in practice, this
> > issue might occur when ANDing with the complement of a
> > mask, for example n&=~0x0F to clear the low 4 bits of n,
> > or ~n&0x0F to find the inverted low 4 bits of n.

> Actually, on 2's complement, we use -1 for the "all bits
> set"... Perhaps we should switch to ~0 (more portable?)

If you're worried about bits, the *only* way you can be sure
of anything where the highest bit might not be 0 is to use
unsigned types. For signed types, ~0 can result in
undefined behavior. (In other words, ~0 is not portable, ~0U
is. As is -1, if that's what you want.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

blargg.h4g

10/20/2008 8:20:00 PM

In article
<cdcc86af-9c2f-4fe7-9b77-dc7c4531563c@w24g2000prd.googlegroups.com>, James
Kanze <james.kanze@gmail.com> wrote:

> blargg wrote:
>> Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
>> suggest so:
>>
>>> If during the evaluation of an expression, the result is not
>>> mathematically defined or not in the range of representable values
>>> for its type, the behavior is undefined [...]
>>
>> The description of unary ~ (C++03 section 5.3.1 paragraph 8):
>>
>>> The operand of ~ shall have integral or enumeration type; the
>>> result is the one's complement of its operand. Integral promotions
>>> are performed. The type of the result is the type of the promoted
>>> operand. [...]
>>
>> But perhaps "one's complement" means the value that type would have with
>> all bits inverted, rather than the mathematical result of inverting all
>> bits in the binary representation. For example, on a machine with 32-bit
>> int, does one's complement of 0 (attempt to) have the value 2^31-1, which
>> can't be represented in a signed int and is thus undefined,
[...]
> The problem here is that the "one's complement" operation doesn't
> really define a numeric result, but rather a manipulation on the
> underlying representation. So I don't think that this statement
> [C++03 section 5 paragraph 5] can be applied: the ~ operator
> changes the bits in the representation, and the "result" is
> whatever value the changed bits happen to represent. Except that
> it's not really too clear what that means, either; what happens if
> the changed bits would be a trapping representation? (E.g. a 1's
> complement machine that traps on negative 0's.)

So you're saying that n = ~n, where n is an int, could be implemented as

for ( size_t i = 0; i < sizeof n; ++i )
reinterpret_cast<unsigned char*> (&n) [i] ^= (unsigned char) -1;

where it's up to the implementation as to the new value n takes on. This
would imply that the following are guaranteed to hold true, regardless of
n's signedess or sign:

~~n == n
(n & ~n) == 0
(n ^ ~n) == ~0
(n & ~0) == n
(n & ~1) == n - (n & 1)

This is the interpretation I really hope is the case.

> Because of such issues, I tend to avoid using ~, | or & on signed
> integral types.

That would require ensuring all bitwise constants are unsigned, by
suffixing with a U, casting, or storing in an unsigned type before use,
which seems somewhat tedious. As in my example, even code for simply
testing the low bit would require a nasty U: n&1U.

James Kanze

10/21/2008 8:06:00 AM

On Oct 20, 10:19 pm, blargg....@gishpuppy.com (blargg) wrote:
> In article
> <cdcc86af-9c2f-4fe7-9b77-dc7c45315...@w24g2000prd.googlegroups.com>, James
> Kanze <james.ka...@gmail.com> wrote:
> > blargg wrote:
> >> Does ~0 yield undefined behavior? C++03 section 5 paragraph 5 seems to
> >> suggest so:

> >>> If during the evaluation of an expression, the result is
> >>> not mathematically defined or not in the range of
> >>> representable values for its type, the behavior is
> >>> undefined [...]

> >> The description of unary ~ (C++03 section 5.3.1 paragraph
> >> 8):

> >>> The operand of ~ shall have integral or enumeration type;
> >>> the result is the one's complement of its operand.
> >>> Integral promotions are performed. The type of the result
> >>> is the type of the promoted operand. [...]

> >> But perhaps "one's complement" means the value that type
> >> would have with all bits inverted, rather than the
> >> mathematical result of inverting all bits in the binary
> >> representation. For example, on a machine with 32-bit int,
> >> does one's complement of 0 (attempt to) have the value
> >> 2^31-1, which can't be represented in a signed int and is
> >> thus undefined,
> [...]
> > The problem here is that the "one's complement" operation
> > doesn't really define a numeric result, but rather a
> > manipulation on the underlying representation. So I don't
> > think that this statement [C++03 section 5 paragraph 5] can
> > be applied: the ~ operator changes the bits in the
> > representation, and the "result" is whatever value the
> > changed bits happen to represent. Except that it's not
> > really too clear what that means, either; what happens if
> > the changed bits would be a trapping representation? (E.g.
> > a 1's complement machine that traps on negative 0's.)

> So you're saying that n = ~n, where n is an int, could be
> implemented as

> for ( size_t i = 0; i < sizeof n; ++i )
> reinterpret_cast<unsigned char*> (&n) [i] ^= (unsigned char) -1;

> where it's up to the implementation as to the new value n
> takes on.

Pretty much. After having posted this, I checked in the C99
standard (where the wording concerning the representation of
integral types has been completely redone, since it was felt
that the original wording wasn't entirely clear). There, it's
much clearer: the operator is described as doing a "bitwise
complement" (and not a one's complement), and in the text
describing representation of integers, it explicitly says that
this operator can result in a negative zero (supposing such
exists in the representation), and if the implementation doesn't
support negative zeros, the behavior is undefined.

(As I interpret it, there are three possibilities: negative zero
can't exist---that's the case for 2's complement---, or if they
exist, the implementation can support them or not, where
"support them" means more or less that you can use them, and
they will work as expected. The C99 does explicitly say that a
negative zero can be a trapping value.)

> This would imply that the following are guaranteed
> to hold true, regardless of n's signedess or sign:

> ~~n == n
> (n & ~n) == 0
> (n ^ ~n) == ~0
> (n & ~0) == n
> (n & ~1) == n - (n & 1)

With the proviso that it's implementation defined whether ~n can
result in a negative 0 or not, and if it does, it's
implementation defined how this value behaves, and it may result
in undefined behavior. I'm not sure about the last, I haven't
had time to analyse it, but they others certainly hold *IF*
there is no undefined behavior.

As a general rule, however, I would look askance at any code
which used bitwise operators on signed values, with a few
exceptions (mainly, masking just the low order bits, i.e. x &
0x0F or x & 0xFF).

> This is the interpretation I really hope is the case.

> > Because of such issues, I tend to avoid using ~, | or & on
> > signed integral types.

> That would require ensuring all bitwise constants are
> unsigned,

Not necessarily. You rarely operate on two constants, and if
the other, non-constant operator is unsigned, the operation will
be unsigned. The unary operator ~ is a special case, but you
generally need to specify the exact type when using it anyway,
in order to ensure the proper length.

> by suffixing with a U, casting, or storing in an unsigned type
> before use, which seems somewhat tedious. As in my example,
> even code for simply testing the low bit would require a nasty
> U: n&1U.

Do you think so? The results of masking all but the lower bits
will always be non-negative, and the problems only occur with
negative results. It's the one exception I would allow.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

10/21/2008 8:15:00 AM

On Oct 20, 9:42 pm, Victor Bazarov <v.Abaza...@comAcast.net> wrote:
> James Kanze wrote:
> > [..]
> > The wording is a bit sloppy, but what it doubtlessly means
> > is that you get a value with all bits set to one (in the
> > specified type). What that value is, of course, is probably
> > implementation dependent; it is -1 on a 2's complement
> > machine, but could very easily be 0 elsewhere.

> Where? C++ only supports three representations, the 1's
> complement, the 2's complement, and the signed magnitude.

It's not at all clear what C++ supports. C++ took the defective
wording of C90, and modified it slightly to make it even worse.
C99 straightened it out, and does only allow three
representations. And as to where the result of ~0 would not be
-1, exactly what I said: "elsewhere [than on a 2's complement
architecture]". On 1's complement, it would be a negative 0.
(The C99 standard explicitly says that it may result in a
negative 0.) Depending on the implementation, a negative 0
either behaves exactly like a postive 0 in arithmetic operations
(but not bitwise operations), or it is undefined behavior.

So the answer to blargg's original question is, somewhat
surprisingly, that ~0 may result in undefined behavior. (Except
that since it is a constant expression, it doesn't cause
undefined behavior, but makes the program ill formed.)

More generally, the results of any of the bitwise operators --
~, |, &, ^, >> or <<, or their <op>= forms -- may result in
undefined behavior. At least according to the C99 standard; the
C++ standard doesn't really say anything meaningful about what
they do.

I currently have a paper before the committee concerning defects
in the specification of the representation of integral types,
with the proposed correction to adopt the wording from C99
(can't see any reason for C and C++ to differ here); I'll update
it to consider these issues as well. (For example, in C99, ~ is
defined as the "bitwise complement", not the 1's complement.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

comp.lang.c++

~0 undefined?

blargg.h4g

Victor Bazarov

James Kanze

blargg.h4g

Juha Nieminen

Victor Bazarov

Victor Bazarov

James Kanze

blargg.h4g

James Kanze

James Kanze

x Login to ForumsZone