Asp Forum - reserve of vector

ab2305

10/14/2008 3:00:00 AM

Does standard mandates that the reserve call should initialize a
container's values to its defaults, hence vector<int> intVec;
intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

Thanks

30 Answers

Kai-Uwe Bux

10/14/2008 3:08:00 AM

ab2305@gmail.com wrote:

> Does standard mandates that the reserve call should initialize a
> container's values to its defaults, hence vector<int> intVec;
> intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?

No. In fact, size() remains unchanged so that referencing those elements is
undefined behavior.

The resize() method, in this case, will grow the vector and initialize the
missing elements.

Best

Kai-Uwe Bux

Stephen Horne

10/14/2008 4:44:00 AM

On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <jkherciueh@gmx.net>
wrote:

>ab2305@gmail.com wrote:
>
>> Does standard mandates that the reserve call should initialize a
>> container's values to its defaults, hence vector<int> intVec;
>> intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?
>
>No. In fact, size() remains unchanged so that referencing those elements is
>undefined behavior.

Just to expand on that, reserve is for managing memory. If you know
that you're about to add thousands of items to a vector, using a
reserve to pre-grow the memory allocation can make it more efficient,
since the memory allocation only gets grown once - not potentially
dozens or hundreds of times.

IIRC sequentially adding n items to the end of a vector takes O(n^2)
time. This is because the memory gets grown O(n) times - every k
items, say, where k is a constant - and each growing can involve
copying the whole vector which is on average O(n) again.

If you can reserve the required space first, sequentially adding n
items is O(n). There is only one memory reallocation, and only one
copying - at most - so its the actual appending of values that
determines the time taken, not repeated memory reallocation and
copying.

There's also a scheme where the reserved memory is doubled on each
reallocation, and this also gives O(n), though it's still slower than
full preallocation.

acehreli

10/14/2008 4:52:00 AM

On Oct 13, 9:44 pm, Stephen Horne <sh006d3...@blueyonder.co.uk> wrote:

> Just to expand on that, reserve is for managing memory. If you know
> that you're about to add thousands of items to a vector, using a
> reserve to pre-grow the memory allocation can make it more efficient,

I've read that reserve() doesn't help much. (I remember reading from
Bjarne Stroustrup himself that he stopped calling reserve() at some
point; but cannot find any references for that.)

> IIRC sequentially adding n items to the end of a vector takes O(n^2)
> time. This is because the memory gets grown O(n) times - every k
> items, say, where k is a constant - and each growing can involve
> copying the whole vector which is on average O(n) again.

The standard requires that push_back has amortized constant time; so
the implementers cannot use the complexity that you describe.

> There's also a scheme where the reserved memory is doubled on each
> reallocation, and this also gives O(n), though it's still slower than
> full preallocation.

Nowadays the implementers use a growth factor of 50%, which reportedly
works better with pool allocators.

Ali

acehreli

10/14/2008 4:55:00 AM

On Oct 13, 9:51 pm, acehr...@gmail.com wrote:

> I've read that reserve() doesn't help much. (I remember reading from
> Bjarne Stroustrup himself that he stopped calling reserve() at some
> point; but cannot find any references for that.)

I found it:

http://www.research.att.com/~bs/bs_faq2.html#slow-...

There, Stroustrup says:

<quote>
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
</quote>

Ali

Kai-Uwe Bux

10/14/2008 5:24:00 AM

Stephen Horne wrote:

> On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <jkherciueh@gmx.net>
> wrote:
>
>>ab2305@gmail.com wrote:
>>
>>> Does standard mandates that the reserve call should initialize a
>>> container's values to its defaults, hence vector<int> intVec;
>>> intVec.reserve(100) would make intVec[0]=intVec[1]....intVec[99]=0 ?
>>
>>No. In fact, size() remains unchanged so that referencing those elements
>>is undefined behavior.
>
> Just to expand on that, reserve is for managing memory. If you know
> that you're about to add thousands of items to a vector, using a
> reserve to pre-grow the memory allocation can make it more efficient,
> since the memory allocation only gets grown once - not potentially
> dozens or hundreds of times.
>
> IIRC sequentially adding n items to the end of a vector takes O(n^2)
> time. This is because the memory gets grown O(n) times - every k
> items, say, where k is a constant - and each growing can involve
> copying the whole vector which is on average O(n) again.

No. Appending to a vector is amortized constant time. The vector only
moves when size() and capacity() conincide. At those points, the capacity
jumps by a certain factor and not by an additive constant (your k).

BTW: this is not a quality of implementation issue. The standard makes this
requirement in clause [23.1.1/12].

> If you can reserve the required space first, sequentially adding n
> items is O(n). There is only one memory reallocation, and only one
> copying - at most - so its the actual appending of values that
> determines the time taken, not repeated memory reallocation and
> copying.
>
> There's also a scheme where the reserved memory is doubled on each
> reallocation, and this also gives O(n), though it's still slower than
> full preallocation.

What matters is not the precise factor but that the constant is
multiplicative rather than additive. I think, many implementations out
there use the golden ratio or some other constant smaller than 2.

Nonetheless, preallocation will beat naive sequential push_back()
performance wise.

Best

Kai-Uwe Bux

Stephen Horne

10/14/2008 6:30:00 AM

On Mon, 13 Oct 2008 21:55:04 -0700 (PDT), acehreli@gmail.com wrote:

>On Oct 13, 9:51 pm, acehr...@gmail.com wrote:
>
>> I've read that reserve() doesn't help much. (I remember reading from
>> Bjarne Stroustrup himself that he stopped calling reserve() at some
>> point; but cannot find any references for that.)
>
>I found it:

I agree - I didn't say anyone should use it ;-)

Personally, I don't have a single use of it in my current working copy
- I just checked. Plenty of resizes, but no reserves. If I need a big
enough collection that it's an issue, I normally end up using a
different kind of container anyway, though not specifically to avoid
this.

I have my own vector ADT based on a multiway tree, though only because
it was convenient given the map, set etc ADTs based on the same
underlying code - they all support log n subscripting. Writing the one
extra wrapper seemed prudent at the time, but I don't think I've ever
used it. If I wanted a huge vector (perhaps a very large audio file) I
might use this since it supports efficient inserts anywhere in the
container - though I'd have to add some bulk ops first.

For video, I wouldn't use it - I'd use a normal vector of pointers,
either to frames or GOPs. An hours video is only in the order of
100,000 frames anyway. And I might just use an vector of pointers to
chunks even for the huge audio file case.

The word "thousands" in my earlier post was out by several orders, I
guess.

I needed a 64 million item 3/4 GB array recently. But it was
fixed-size, so I used a fixed size array, not a vector. If I had used
a vector, I would probably have used reserve.

Strictly I did use a vector with millions of items at one stage -
underlying a priority queue that held up to approx. two million items.
Copy overheads *were* an issue here, but not so much from memory
allocation as from the items flowing through the queue. I solved it by
using a more specialised queuing method, putting items into the
location in the main array where they'd be needed (with some collision
handling).

But even that was for a programming challenge thing that I was just
doing for fun.

http://project...

Problem 211 - it turns out I solved it the hard way, but what the
hell.

Wish I could justify more time for these problems. I've had a plan for
problem 210 (but no code yet) for about a week. This time doing it the
easy way ;-)

Stephen Horne

10/14/2008 6:42:00 AM

On Tue, 14 Oct 2008 01:23:56 -0400, Kai-Uwe Bux <jkherciueh@gmx.net>
wrote:

>No. Appending to a vector is amortized constant time. The vector only
>moves when size() and capacity() conincide. At those points, the capacity
>jumps by a certain factor and not by an additive constant (your k).

That's good to know. Thanks.

I could swear that there used to be fixed increment - even that the
interface allowed the increment to be changed - but there you go.

I didn't think they'd impose that because 50% worst-case memory
overhead (often 75% when you allow for deletes), but its no bad thing.
A memory overhead on that scale isn't usually a big deal these days.

>What matters is not the precise factor but that the constant is
>multiplicative rather than additive. I think, many implementations out
>there use the golden ratio or some other constant smaller than 2.

Reducing the worst-case memory overhead. Makes sense.

James Kanze

10/14/2008 9:34:00 AM

On Oct 14, 6:44 am, Stephen Horne <sh006d3...@blueyonder.co.uk> wrote:
> On Mon, 13 Oct 2008 23:08:12 -0400, Kai-Uwe Bux <jkherci...@gmx.net>
> wrote:

> >ab2...@gmail.com wrote:

> >> Does standard mandates that the reserve call should
> >> initialize a container's values to its defaults, hence
> >> vector<int> intVec; intVec.reserve(100) would make
> >> intVec[0]=intVec[1]....intVec[99]=0 ?

> >No. In fact, size() remains unchanged so that referencing
> >those elements is undefined behavior.

> Just to expand on that, reserve is for managing memory. If you
> know that you're about to add thousands of items to a vector,
> using a reserve to pre-grow the memory allocation can make it
> more efficient, since the memory allocation only gets grown
> once - not potentially dozens or hundreds of times.

It's rarely necessary for optimization reasons (although it
doesn't hurt). The main reason for using reserve() is to avoid
invalidating iterators, pointers and references into the vector.
(Appending to a vector will not invalidate iterators unless the
capacity() increases.) Another reason, particularly with very
large vectors, is to reduce total memory consumption.

> IIRC sequentially adding n items to the end of a vector takes
> O(n^2) time. This is because the memory gets grown O(n) times
> - every k items, say, where k is a constant - and each growing
> can involve copying the whole vector which is on average O(n)
> again.

The standard requires that appending to a vector be "amortized
constant time". In practice, this means that memory growth must
be exponential---older implementations generally doubled the
size each time, but I think the general consensus today is that
1.5 times is preferrable.

> If you can reserve the required space first, sequentially adding n
> items is O(n).

Creating a vector of n objects using push_back is required by
the standard to be O(n). All reserve() does is reduce the
constant factor.

> There is only one memory reallocation, and only one copying -
> at most - so its the actual appending of values that
> determines the time taken, not repeated memory reallocation
> and copying.

> There's also a scheme where the reserved memory is doubled on
> each reallocation, and this also gives O(n), though it's still
> slower than full preallocation.

The real problem with it is memory consumption. You can easily
end up consuming three times more memory than you are actually
using.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

10/14/2008 9:43:00 AM

On Oct 14, 7:23 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> Stephen Horne wrote:

[...]
> What matters is not the precise factor but that the constant
> is multiplicative rather than additive. I think, many
> implementations out there use the golden ratio or some other
> constant smaller than 2.

I'd be surprised if they used exactly the golden ratio, but 1.5
is an easy to calculate approximation.

The motivation for this is simple: if you double each time, the
sum of the memory you've previously freed is always less than
the amount of memory you're requesting. A little bit of
mathematical analysis shows that this is the case any time the
multiplier is greater than the golden ration.

> Nonetheless, preallocation will beat naive sequential
> push_back() performance wise.

Slightly.

I have a couple of cases (maybe two in the last ten years) where
I've used reserve(). Always to avoid invalidating iterators,
however. (Most of the time, however, if the size of the vector
is changing, I'll save the index, not an iterator.)

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Kai-Uwe Bux

10/14/2008 9:52:00 AM

James Kanze wrote:

> On Oct 14, 7:23 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
>> Stephen Horne wrote:
>
> [...]
>> What matters is not the precise factor but that the constant
>> is multiplicative rather than additive. I think, many
>> implementations out there use the golden ratio or some other
>> constant smaller than 2.
>
> I'd be surprised if they used exactly the golden ratio, but 1.5
> is an easy to calculate approximation.
>
> The motivation for this is simple: if you double each time, the
> sum of the memory you've previously freed is always less than
> the amount of memory you're requesting. A little bit of
> mathematical analysis shows that this is the case any time the
> multiplier is greater than the golden ration.

An implementation could easily store the capacity and the previous capacity.
The new capacity would simply the the sum of both. That will satisfy the
motivation and converge to the golden ratio.

As for the motivation, I think that might be good for some allocators but I
find it hard to believe that it matters all that much. It is highly
unlikely that memory gets reused by the same vector. More likely, a
different vector is using that block. In any case, that strategy is easy to
turn into a policy and one can play and measure whether it has any impact.

[snip]

Best

Kai-Uwe Bux

comp.lang.c++

reserve of vector

ab2305

Kai-Uwe Bux

Stephen Horne

acehreli

acehreli

Kai-Uwe Bux

Stephen Horne

Stephen Horne

James Kanze

James Kanze

Kai-Uwe Bux

x Login to ForumsZone