James Kanze
10/17/2008 4:48:00 PM
On Oct 17, 3:43 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> Victor Bazarov wrote:
> > Juha Nieminen wrote:
> >> Victor Bazarov wrote:
> >>> Start with an empty 'std::set' and after inserting another
> >>> value, check the size, and if it's greater than N, remove
> >>> the first in the set.
> >> If N is very large, std::set can be a real memory hog.
> >> If a more memory-efficient algorithm is needed, then a heap
> >> can be used. (The exact same type of heap as used in heap
> >> sorting.) Simply make the element comparison so that the
> >> element with the lowest score ends up as the first element
> >> of the heap. When the heap grows one element larger than N,
> >> remove this first element.
> >> Inserting an element in the heap is O(lg n), and removing
> >> the first element is also O(lg n) (which is the reason why
> >> heap sort is O(n lg n).) This is the same complexity as
> >> with std::set, the difference being that the heap is much
> >> more memory-efficient: an array to store N+1 elements is
> >> enough.
> > Inserting in the middle would mean either reallocation or
> > moving of the trailing elements, would it?
> No. Juha is talking about heap operations, [25.3.6]. Insertion
> and deletion of an element means swapping at most log(N)
> elements.
It may also involve some copying if you need to increase the
array size in order to do the insert. Depending on the data
types involved (and the size of the array), either:
v.resize( v.size() + 1 ) ;
std::push_heap( v.begin(), v.end(), newElement ) ;
or
v.insert( std::upperBound( v.begin(), v.end(), newElement ),
newElement ) ;
may be faster.
Of course, since the OP knows the target size to begin with, he
can construct the vector with the correct size, filled with
dummy elements which are guaranteed to rank lower, and then use
the heap operations (inversing the comparison function, so that
the first element is the lowest rank); comparing each element he
sees with the first element, then if it ranks higher, swapping
the first element with the last and using push_heap. If the
element type is optimized for swapping (as are all of the
standard containers---but not std::string, so if he needs to
store a string, he might consider using std::vector<char>
instead), this should be a lot faster than the alternatives.
> > That's why it will actually be slower than the set. Right?
> Depends on how costly 2*log(N) swaps are in comparison to
> allocation and deallocation of a node. The number of
> comparisons is the same.
Just a guess, but if he uses std::string, std::set will be
faster; if he uses std::vector<char>, in a struct, and provides
an intelligent swap, the heap operations will be faster.
Unless, of course, the poorer locality of std::set causes more
page misses.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34