Asp Forum - Efficient use of realloc

Steve Keller

4/20/2011 7:23:00 PM

How efficiently will typical malloc implementations behave if I
continuously enlarge a memory region by small amounts. Say I have a
loop producing items that I want to store contiguously in memory:

struct foo *p, *new, *a = NULL;
int count = 0;
while (p = get_item()) {
new = realloc(a, (count + 1) * sizeof(*a));
if (!new)
break;
a = new;
a[count++] = *p;
}

To avoid numerous calls to realloc() I have often written code like
this:

struct foo *p, *new, *a = NULL;
int count = 0, alloced = 0;
while (p = get_item()) {
if (count >= alloced) {
new = realloc(a, (alloced += 256) * sizeof(*a));
if (!new)
break;
a = new;
}
a[count++] = *p;
}

But should one really do this or should one rely on the standard
library to handle things in an intelligent way, even without knowing
how many and how large the items will be?

How will typical malloc implementations behave, say BSD and GNU libc?

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

11 Answers

William Ahern

4/20/2011 9:47:00 PM

Steve Keller <keller@no.invalid> wrote:
<snip>
> But should one really do this or should one rely on the standard
> library to handle things in an intelligent way, even without knowing
> how many and how large the items will be?

> How will typical malloc implementations behave, say BSD and GNU libc?

They behave differently, and many modern implementations may use completely
different schemes depending on the size.

Here's glibc's default allocator:
http://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc...

OpenBSD's
http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc....

NetBSD and FreeBSD use jemalloc:
http://www.canonware.com...

Worrying too much about the performance of resizing an array could be a hint
that a different data structure, like a linked list, might be more
appropriate. In which case you might want to implement a factory that caches
freed objects for reallocation later, and initially allocates objects in
batches.

Conversely, you may know more information about the eventual size of the
array, so rather than trying to reimplement a general purpose strategy, make
use of the knowledge you do have. For example, if you initially know that
you have at least 100 items to append to the array, then use that as a lower
bound on your reallocations. Then there's signifiantly less reason to care
that you ended up having to append 110 items--10 of them queued in the
interim by a parallel thread--unless you discover something peculiar or
pathological about the actual allocation sizes.

The more reason you have to be concerned with the costs of allocation, for
example because of DoS fears or performance smoothing, the more narrowly
tailored the solution should be, and the more independent it should be from
the standard library implementation, which vendors often change.

And another strong consideration to take in mind is not just performance,
but ease of memory management. Often these demands are in accord rather than
conflict.

cri

4/21/2011 3:00:00 AM

On Wed, 20 Apr 2011 21:22:38 +0200, Steve Keller <keller@no.invalid>
wrote:

>How efficiently will typical malloc implementations behave if I
>continuously enlarge a memory region by small amounts. Say I have a
>loop producing items that I want to store contiguously in memory:
>
> struct foo *p, *new, *a = NULL;
> int count = 0;
> while (p = get_item()) {
> new = realloc(a, (count + 1) * sizeof(*a));
> if (!new)
> break;
> a = new;
> a[count++] = *p;
> }
>
>To avoid numerous calls to realloc() I have often written code like
>this:
>
> struct foo *p, *new, *a = NULL;
> int count = 0, alloced = 0;
> while (p = get_item()) {
> if (count >= alloced) {
> new = realloc(a, (alloced += 256) * sizeof(*a));
> if (!new)
> break;
> a = new;
> }
> a[count++] = *p;
> }
>
>But should one really do this or should one rely on the standard
>library to handle things in an intelligent way, even without knowing
>how many and how large the items will be?

When you don't know how large an array might grow, the usual practice
is to increase the size by some multiplicative factor. I usually use
a factor of 1.5 but 2 is probably the most common choice. Thus

new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */

The problem with arithmetic growth is that the cost of copying in the
reallocs grows quadratically. If you use geometric growth the cost of
copying grows linearly.

pete

4/21/2011 3:18:00 AM

Richard Harter wrote:
>
> On Wed, 20 Apr 2011 21:22:38 +0200, Steve Keller <keller@no.invalid>
> wrote:
>
> >How efficiently will typical malloc implementations behave if I
> >continuously enlarge a memory region by small amounts. Say I have a
> >loop producing items that I want to store contiguously in memory:
> >
> > struct foo *p, *new, *a = NULL;
> > int count = 0;
> > while (p = get_item()) {
> > new = realloc(a, (count + 1) * sizeof(*a));
> > if (!new)
> > break;
> > a = new;
> > a[count++] = *p;
> > }
> >
> >To avoid numerous calls to realloc() I have often written code like
> >this:
> >
> > struct foo *p, *new, *a = NULL;
> > int count = 0, alloced = 0;
> > while (p = get_item()) {
> > if (count >= alloced) {
> > new = realloc(a, (alloced += 256) * sizeof(*a));
> > if (!new)
> > break;
> > a = new;
> > }
> > a[count++] = *p;
> > }
> >
> >But should one really do this or should one rely on the standard
> >library to handle things in an intelligent way, even without knowing
> >how many and how large the items will be?
>
> When you don't know how large an array might grow, the usual practice
> is to increase the size by some multiplicative factor. I usually use
> a factor of 1.5 but 2 is probably the most common choice. Thus
>
> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */
>
> The problem with arithmetic growth is that the cost of copying in the
> reallocs grows quadratically. If you use geometric growth the cost of
> copying grows linearly.

I use something like a factor of 2 in my get_line function,
but then it falls to binary search if that fails.

If (min) is how many bytes I need
and (*n) is how many are already allocated,
then it goes like this:

if (++min > *n) {
void *p;
size_t max;
size_t next;

for (next = 3; min > next; ++next) {
next *= 2;
}
do {
max = next;
next = (max - min) / 2 + min;
p = realloc(*lineptr, max);
} while (p == NULL && max != min);
if (p == NULL) {
/*
** Handle it
*/
} else {
*lineptr = p;
*n = max;
}
}

http://www.mindspring.com/~pfilandr/C/get_line/...

--
pete

Ben Bacarisse

4/21/2011 1:43:00 PM

cri@tiac.net (Richard Harter) writes:

> On Wed, 20 Apr 2011 21:22:38 +0200, Steve Keller <keller@no.invalid>
> wrote:
<snip>
>>To avoid numerous calls to realloc() I have often written code like
>>this:
>>
>> struct foo *p, *new, *a = NULL;
>> int count = 0, alloced = 0;
>> while (p = get_item()) {
>> if (count >= alloced) {
>> new = realloc(a, (alloced += 256) * sizeof(*a));
>> if (!new)
>> break;
>> a = new;
>> }
>> a[count++] = *p;
>> }
>>
>>But should one really do this or should one rely on the standard
>>library to handle things in an intelligent way, even without knowing
>>how many and how large the items will be?
>
> When you don't know how large an array might grow, the usual practice
> is to increase the size by some multiplicative factor. I usually use
> a factor of 1.5 but 2 is probably the most common choice. Thus
>
> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */

Small point to watch out for: multiplicative schemes don't work when
alloced is zero to start with (as in this example). There are lots of
fixes, of course. I often just add a constant:

new = realloc(a, (alloced += alloced/2 + 8) * sizeof(*a));

which gives a minimum starting size. Alternatively you can special-case
the first allocation or start with alloced not zero (which can affect the
logic of the loop since it is, effectively, a lie).

<snip>
--
Ben.

Keith Thompson

4/21/2011 2:55:00 PM

Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> cri@tiac.net (Richard Harter) writes:
[...]
>> When you don't know how large an array might grow, the usual practice
>> is to increase the size by some multiplicative factor. I usually use
>> a factor of 1.5 but 2 is probably the most common choice. Thus
>>
>> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
>> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */
>
> Small point to watch out for: multiplicative schemes don't work when
> alloced is zero to start with (as in this example).
[...]

And a factor of 1.5 doesn't work if alloced starts as 1, since 1/2 is 0.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.ne...
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

pete

4/21/2011 11:51:00 PM

Keith Thompson wrote:
>
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> > cri@tiac.net (Richard Harter) writes:
> [...]
> >> When you don't know how large an array might grow, the usual practice
> >> is to increase the size by some multiplicative factor. I usually use
> >> a factor of 1.5 but 2 is probably the most common choice. Thus
> >>
> >> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
> >> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */
> >
> > Small point to watch out for: multiplicative schemes don't work when
> > alloced is zero to start with (as in this example).
> [...]
>
> And a factor of 1.5 doesn't work if alloced starts as 1,
> since 1/2 is 0.

(alloced += alloced + 1) works fine
if alloced starts out as either 1 or 0.

--
pete

Keith Thompson

4/22/2011 4:12:00 PM

pete <pfiland@mindspring.com> writes:
> Keith Thompson wrote:
>> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> > cri@tiac.net (Richard Harter) writes:
>> [...]
>> >> When you don't know how large an array might grow, the usual practice
>> >> is to increase the size by some multiplicative factor. I usually use
>> >> a factor of 1.5 but 2 is probably the most common choice. Thus
>> >>
>> >> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
>> >> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */
>> >
>> > Small point to watch out for: multiplicative schemes don't work when
>> > alloced is zero to start with (as in this example).
>> [...]
>>
>> And a factor of 1.5 doesn't work if alloced starts as 1,
>> since 1/2 is 0.
>
> (alloced += alloced + 1) works fine
> if alloced starts out as either 1 or 0.

Sure, it works, but it's likely to be horribly inefficient to grow an
allocated array by a single element at a time.

Growing by a factor of 1.5 or 2 is likely to be the best approach; you
just have to watch out for the edge cases.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.ne...
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

cri

4/22/2011 4:24:00 PM

On Fri, 22 Apr 2011 09:12:17 -0700, Keith Thompson <kst-u@mib.org>
wrote:

>pete <pfiland@mindspring.com> writes:
>> Keith Thompson wrote:
>>> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>>> > cri@tiac.net (Richard Harter) writes:
>>> [...]
>>> >> When you don't know how large an array might grow, the usual practice
>>> >> is to increase the size by some multiplicative factor. I usually use
>>> >> a factor of 1.5 but 2 is probably the most common choice. Thus
>>> >>
>>> >> new = realloc(a, (alloced *=2) * sizeof(*a)); /* Factor of 2 */
>>> >> new = realloc(a, (alloced += alloced/2) * sizeof(*a)); /* 1.5 */
>>> >
>>> > Small point to watch out for: multiplicative schemes don't work when
>>> > alloced is zero to start with (as in this example).
>>> [...]
>>>
>>> And a factor of 1.5 doesn't work if alloced starts as 1,
>>> since 1/2 is 0.
>>
>> (alloced += alloced + 1) works fine
>> if alloced starts out as either 1 or 0.
>
>Sure, it works, but it's likely to be horribly inefficient to grow an
>allocated array by a single element at a time.

Are you misreading the line, or is there some obscure parsing gotcha
that I'm missing?

>
>Growing by a factor of 1.5 or 2 is likely to be the best approach; you
>just have to watch out for the edge cases.
>
>--
>Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.ne...
>Nokia
>"We must do something. This is something. Therefore, we must do this."
> -- Antony Jay and Jonathan Lynn, "Yes Minister"

Kenneth Brody

4/22/2011 5:25:00 PM

On 4/22/2011 12:24 PM, Richard Harter wrote:
> On Fri, 22 Apr 2011 09:12:17 -0700, Keith Thompson<kst-u@mib.org>
> wrote:
[...]
>>> (alloced += alloced + 1) works fine
>>> if alloced starts out as either 1 or 0.
>>
>> Sure, it works, but it's likely to be horribly inefficient to grow an
>> allocated array by a single element at a time.
>
> Are you misreading the line, or is there some obscure parsing gotcha
> that I'm missing?
[...]

It appears that he miss-parsed it as "=" instead of "+=", as did I
initially. The code as written will double and add one, not just add one.

--
Kenneth Brody

cri

4/22/2011 6:21:00 PM

On Fri, 22 Apr 2011 13:24:54 -0400, Kenneth Brody
<kenbrody@spamcop.net> wrote:

>On 4/22/2011 12:24 PM, Richard Harter wrote:
>> On Fri, 22 Apr 2011 09:12:17 -0700, Keith Thompson<kst-u@mib.org>
>> wrote:
>[...]
>>>> (alloced += alloced + 1) works fine
>>>> if alloced starts out as either 1 or 0.
>>>
>>> Sure, it works, but it's likely to be horribly inefficient to grow an
>>> allocated array by a single element at a time.
>>
>> Are you misreading the line, or is there some obscure parsing gotcha
>> that I'm missing?
>[...]
>
>It appears that he miss-parsed it as "=" instead of "+=", as did I
>initially. The code as written will double and add one, not just add one.

Just as a note, in modern machines with caches execution time is
likely to be the same for both of these forms:

alloced += alloced/2 + 8;
alloced += alloced;

comp.lang.c

Efficient use of realloc

Steve Keller

William Ahern

cri

pete

Ben Bacarisse

Keith Thompson

pete

Keith Thompson

cri

Kenneth Brody

cri

x Login to ForumsZone