[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c

real world sizeof bugs

William Ahern

8/24/2011 5:30:00 PM

Because the issue gets discussed here on a semi-monthly basis, I thought the
following real-world example might be worth posting. The code belongs to the
c-ares project, which is used in much FLOSS and commercial software.

Backdrop of the code in question:

struct nameinfo_query {
union {
struct sockaddr_in addr4;
struct sockaddr_in6 addr6;
} addr;
};

struct nameinfo_query niquery;
struct sockaddr_in *addr;

Original code, circa 2005:

memcpy(&niquery->addr.addr4, addr, sizeof(addr));

First fix in June 2011:

memcpy(&niquery->addr.addr4, addr, sizeof(struct in_addr));

Final fix:

memcpy(&niquery->addr.addr4, addr, sizeof(niquery->addr.addr4));

Note that struct in_addr is the type of one of several members within struct
sockaddr_in. The error isn't surprising because older BSD Sockets interfaces
often take a struct in_addr or similar, whereas newer ones use struct
sockaddr_in or similar.

It's heartening that the alternative fix, `sizeof(*addr)', wasn't used.
Although one wonders why not a simple assignment.

Piquing mailing-list post:

http://c-ares.haxx.se/mail/c-ares-archive-2011-08/...

20 Answers

Ben Pfaff

8/24/2011 5:38:00 PM

0

William Ahern <william@wilbur.25thandClement.com> writes:

> It's heartening that the alternative fix, `sizeof(*addr)', wasn't used.
> Although one wonders why not a simple assignment.

I run into a surprising number of C programmers who don't realize
that struct assignment is allowed.
--
Ben Pfaff
http://be...

Kenneth Brody

8/25/2011 1:46:00 PM

0

On 8/24/2011 1:29 PM, William Ahern wrote:
> Because the issue gets discussed here on a semi-monthly basis, I thought the
> following real-world example might be worth posting. The code belongs to the
> c-ares project, which is used in much FLOSS and commercial software.
>
> Backdrop of the code in question:
>
> struct nameinfo_query {
> union {
> struct sockaddr_in addr4;
> struct sockaddr_in6 addr6;
> } addr;
> };
>
> struct nameinfo_query niquery;
> struct sockaddr_in *addr;
>
> Original code, circa 2005:
>
> memcpy(&niquery->addr.addr4, addr, sizeof(addr));
>
> First fix in June 2011:
>
> memcpy(&niquery->addr.addr4, addr, sizeof(struct in_addr));
>
> Final fix:
>
> memcpy(&niquery->addr.addr4, addr, sizeof(niquery->addr.addr4));
[...]

Well, in this case, the bug was using the sizeof() the wrong thing. In this
case, you're dealing with a destination that may be larger than the source,
due to the fact that it can hold multiple types of structs. Therefore, as
the "final fix" uses, you need the sizeof() the source, not the destination.

Consider a simpler version of the same thing:

unsigned char GenericBuffer[65535];
struct foo MyFoo;
...
memcpy(&MyFoo,GenericBuffer,sizeof(???));

(This assumes, of course, that the implementor knows that GenericBuffer will
never need to hold anything larger than 65535 bytes.)

--
Kenneth Brody

Joe Pfeiffer

8/25/2011 3:34:00 PM

0

Kenneth Brody <kenbrody@spamcop.net> writes:

> On 8/24/2011 1:29 PM, William Ahern wrote:
>> Because the issue gets discussed here on a semi-monthly basis, I thought the
>> following real-world example might be worth posting. The code belongs to the
>> c-ares project, which is used in much FLOSS and commercial software.
>>
>> Backdrop of the code in question:
>>
>> struct nameinfo_query {
>> union {
>> struct sockaddr_in addr4;
>> struct sockaddr_in6 addr6;
>> } addr;
>> };
>>
>> struct nameinfo_query niquery;
>> struct sockaddr_in *addr;
>>
>> Original code, circa 2005:
>>
>> memcpy(&niquery->addr.addr4, addr, sizeof(addr));
>>
>> First fix in June 2011:
>>
>> memcpy(&niquery->addr.addr4, addr, sizeof(struct in_addr));
>>
>> Final fix:
>>
>> memcpy(&niquery->addr.addr4, addr, sizeof(niquery->addr.addr4));
> [...]
>
> Well, in this case, the bug was using the sizeof() the wrong thing.
> In this case, you're dealing with a destination that may be larger
> than the source, due to the fact that it can hold multiple types of
> structs. Therefore, as the "final fix" uses, you need the sizeof()
> the source, not the destination.
>
> Consider a simpler version of the same thing:
>
> unsigned char GenericBuffer[65535];
> struct foo MyFoo;
> ...
> memcpy(&MyFoo,GenericBuffer,sizeof(???));
>
> (This assumes, of course, that the implementor knows that
> GenericBuffer will never need to hold anything larger than 65535
> bytes.)

No, the final fix used the sizeof the actual destination. Which
suggests the following might be a "final final fix":

sizeof(GenericBuffer) < sizeof(MyFoo) ? sizeof(GenericBuffer) : sizeof(MyFoo)

Jens Gustedt

8/25/2011 8:59:00 PM

0

Am 08/24/2011 07:29 PM, schrieb William Ahern:
> Although one wonders why not a simple assignment.

Hm, because the struct has union members?

I recently scanned the standard for this and it is not completely clear.
There is basically not even an operational description what an
assignment is supposed to do.

Only thing that I found is a description of initialization, and there
the rule for unions is clearly that if there is no designation an
initialization concerns the first named member.

By analogy I would guess that assignment of unions only copies the first
member and not the necessarily the whole. But this is only a guess I
didn't find an explicit statement of this.

The standard also states that accessing a member of a union that was not
the last one to which it was written may lead to undefined behavior. So
in particular, if the first member was not the one that was written to,
an assignment of a union may lead to undefined behavior? Odd.

Jens

James Kuyper

8/25/2011 9:26:00 PM

0

On 08/25/2011 04:58 PM, Jens Gustedt wrote:
> Am 08/24/2011 07:29 PM, schrieb William Ahern:
>> Although one wonders why not a simple assignment.
>
> Hm, because the struct has union members?
>
> I recently scanned the standard for this and it is not completely clear.
> There is basically not even an operational description what an
> assignment is supposed to do.

6.5.16p1: "In simple assignment (=), the value of the right operand is
converted to the type of the assignment expression and replaces the
value stored in the object designated by the left operand."

In general for union objects, unless the implementation has kept track
of which member of the union was last written to, I don't seen any way
to implement those semantics which is not equivalent to memcpy(). If it
has kept track of the appropriate member, it could implement it as a
simple assignment of the corresponding member.

> Only thing that I found is a description of initialization, and there
> the rule for unions is clearly that if there is no designation an
> initialization concerns the first named member.

Some aspects of initialization are defined by reference to assignment
(6.7.8p11), but the description of assignment does not in any way
reference the description of initialization, so that's not particularly
relevant.

Kenneth Brody

8/26/2011 2:37:00 PM

0

On 8/25/2011 11:34 AM, Joe Pfeiffer wrote:
> Kenneth Brody<kenbrody@spamcop.net> writes:
>> On 8/24/2011 1:29 PM, William Ahern wrote:
[...]
>>> Original code, circa 2005:
>>>
>>> memcpy(&niquery->addr.addr4, addr, sizeof(addr));
>>>
>>> First fix in June 2011:
>>>
>>> memcpy(&niquery->addr.addr4, addr, sizeof(struct in_addr));
>>>
>>> Final fix:
>>>
>>> memcpy(&niquery->addr.addr4, addr, sizeof(niquery->addr.addr4));
>> [...]
>>
>> Well, in this case, the bug was using the sizeof() the wrong thing.
>> In this case, you're dealing with a destination that may be larger
>> than the source, due to the fact that it can hold multiple types of
>> structs. Therefore, as the "final fix" uses, you need the sizeof()
>> the source, not the destination.
[...]
> No, the final fix used the sizeof the actual destination. Which
> suggests the following might be a "final final fix":

<voice mode="Emily Latella">
Never mind.
</voice>

Yes, I did get source and destination backwards. :-(

But, the same basics apply. You're copying between a generic buffer
(actually a union in this case) and a specific type, and you need to only
copy those bytes that exist in the specific type, regardless of direction of
the copy.

[...]

--
Kenneth Brody

Jens Gustedt

8/27/2011 5:59:00 PM

0

Am 08/25/2011 11:25 PM, schrieb James Kuyper:
> 6.5.16p1: "In simple assignment (=), the value of the right operand is
> converted to the type of the assignment expression and replaces the
> value stored in the object designated by the left operand."
>
> In general for union objects, unless the implementation has kept track
> of which member of the union was last written to, I don't seen any way
> to implement those semantics which is not equivalent to memcpy(). If it
> has kept track of the appropriate member, it could implement it as a
> simple assignment of the corresponding member.

I would state it otherwise. I think the sematics for unions here are
simply unclear. To my understanding a union type object as such
doesn't have a value. Only one of the members (the last one assigned
to) has a value.

Wouldn't implementing it as equivalent of memcpy mean that this allows
the access to a part of the object that is indeterminate? Thus
undefined behavior?

Jens

Eric Sosman

8/27/2011 7:37:00 PM

0

On 8/27/2011 1:59 PM, Jens Gustedt wrote:
> Am 08/25/2011 11:25 PM, schrieb James Kuyper:
>> 6.5.16p1: "In simple assignment (=), the value of the right operand is
>> converted to the type of the assignment expression and replaces the
>> value stored in the object designated by the left operand."
>>
>> In general for union objects, unless the implementation has kept track
>> of which member of the union was last written to, I don't seen any way
>> to implement those semantics which is not equivalent to memcpy(). If it
>> has kept track of the appropriate member, it could implement it as a
>> simple assignment of the corresponding member.
>
> I would state it otherwise. I think the sematics for unions here are
> simply unclear. To my understanding a union type object as such
> doesn't have a value. Only one of the members (the last one assigned
> to) has a value.

According to the Standard (6.2.6.1p6), the union has a value
of its own, distinct from its most recently stored element. The
same paragraph explains one of the differences: If the element
holds a trap representation, the union nonetheless has a value and
that value is not a trap. Hence, for example:

union { double trouble; char broil; } x, y;
double signalling_nan = ...; // system-specific
memcpy(&x.trouble, &signalling_nan, sizeof x.trouble);
y = x; // no problem
y.trouble = x.trouble; // may trap

> Wouldn't implementing it as equivalent of memcpy mean that this allows
> the access to a part of the object that is indeterminate? Thus
> undefined behavior?

No, for two reasons. First, there's the paragraph mentioned
above: The union itself has a perfectly good value. Second, any
object at all can be viewed as an array of unsigned char (which
has no trap representations), even if the value of the object
itself is ill-defined.

--
Eric Sosman
esosman@ieee-dot-org.invalid

James Kuyper

8/27/2011 8:38:00 PM

0

On 08/27/2011 01:59 PM, Jens Gustedt wrote:
....
> I would state it otherwise. I think the sematics for unions here are
> simply unclear. To my understanding a union type object as such
> doesn't have a value. Only one of the members (the last one assigned
> to) has a value.

Eric Sosman has already addressed this issue directly, but I'll take an
indirect approach.

6.5.16.1p1 explicitly allows the left operand of a simple assignment
expression to be a union of a type compatible with the right operand
(which therefore must also be a union). 6.5.16.1p2 says that "the value
of the right operand is converted to the type of the assignment
expression and replaces the value stored in the object designated by the
left operand." If unions didn't actually have a value, that statement
would be meaningless, and explicitly allowing simple assignment of
unions would therefore be pointless.

> Wouldn't implementing it as equivalent of memcpy mean that this allows
> the access to a part of the object that is indeterminate? Thus
> undefined behavior?

No, an object with an indeterminate value either represents an
unspecified value, or contains a trap representation. (3.17.2) Undefined
behavior is not associated directly with with indeterminate values, but
only indeterminate values that happen to have trap representations.
memcpy()'s behavior is defined in terms of accessing the object as an
array of characters. (7.21.2.1p2). The undefined behavior that is
associated with trap representations occurs only when they are accessed
using an lvalue of non-character type. (6.2.6.1p5).

--
James Kuyper

Ben Bacarisse

8/27/2011 9:00:00 PM

0

Jens Gustedt <jens.gustedt@loria.fr> writes:

> Am 08/25/2011 11:25 PM, schrieb James Kuyper:
>> 6.5.16p1: "In simple assignment (=), the value of the right operand is
>> converted to the type of the assignment expression and replaces the
>> value stored in the object designated by the left operand."
>>
>> In general for union objects, unless the implementation has kept track
>> of which member of the union was last written to, I don't seen any way
>> to implement those semantics which is not equivalent to memcpy(). If it
>> has kept track of the appropriate member, it could implement it as a
>> simple assignment of the corresponding member.
>
> I would state it otherwise. I think the sematics for unions here are
> simply unclear. To my understanding a union type object as such
> doesn't have a value. Only one of the members (the last one assigned
> to) has a value.

It helps to keep in mind C's rather specific meaning of "value". An
object (just a region of storage) acquires a value only when it is
interpreted as having a particular type. The object itself has no
intrinsic value. This is what it means to say that the union has a
value of it's own, independent of the values of the members -- the
storage is access via an expression of union type.

All the union members also have a value. It's determined by
re-interpreting the bits of the some part of the object as if is were a
value of the member's type (see footnote 82 of 6.5.2.3 p3). The
possibility of undefined behaviour comes only from the fact that this my
be trap representation.

<snip>
--
Ben.