[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c++

Speeding up access to STL vectors?

Steve555

12/5/2008 9:10:00 AM

Hi

My vector ( typedef vector<movieRatingPair*> ratingsV ) contains
this struct:

typedef struct{
unsigned short movieID;
unsigned short rating;
} movieRatingPair;

....and on average will contain 200 such structs;
The structs are always stored in the vector in ascending order of
movieID.

I need to find a quick way to check if the vector contains a specific
movieID. I can not use a map instead of a vector, because the extra
few bytes they use creates a problem.(*)

Simply iterating through the vector is proving a bit slow; I guess I
can write some binary search code, but just wanted to check I wasn't
re-inventing the wheel, or simply missing a much better way of dealing
with the whole thing.

Many thanks for any advice.

Steve


(*)There are nearly half million instances of these vectors. (1 per
customer which are stored in an stl map, keyed by their userID.)
In total, therefore, there are 100+ million records. It's effectively
a read-only database, with the data being initially fetched from disk,
and then the data MUST be kept in 2.5Gb of free memory for quick
access/analysis.
Unfortunately, using maps instead of vectors means each record is 32
bytes * 100+million = 3Gb
11 Answers

Kai-Uwe Bux

12/5/2008 9:31:00 AM

0

Steve555 wrote:

> Hi
>
> My vector ( typedef vector<movieRatingPair*> ratingsV ) contains
> this struct:
>
> typedef struct{
> unsigned short movieID;
> unsigned short rating;
> } movieRatingPair;

a) You can ditch the typedef:

struct movieRatingPair {
unsigned short movieID;
unsigned short rating;
};

will do.

b) Is there a reason to use vector<movieRatingPair*> instead of
vector<movieRatingPair>? Note that dynamic allocation of objects usually
has some overhead: (1) the pointer (2) some size information about the
allocated object. You might end up using 12 bytes (4 in the pointer and 8
in the pointee) instead of 4 (which is what two unsigned shorts might give
you, but that is platform specific).

> ...and on average will contain 200 such structs;
> The structs are always stored in the vector in ascending order of
> movieID.
>
> I need to find a quick way to check if the vector contains a specific
> movieID. I can not use a map instead of a vector, because the extra
> few bytes they use creates a problem.(*)
>
> Simply iterating through the vector is proving a bit slow; I guess I
> can write some binary search code, but just wanted to check I wasn't
> re-inventing the wheel, or simply missing a much better way of dealing
> with the whole thing.

Provide a functor to compare movieRatingPair (or movieRatingPair*) by the
movieID. Then, sort the vector using std::sort() and then test for entries
with std::binary_search(). You can also find iterators to the entries with
std::lower_bound() and std::upper_bound().


[snip]


Best

Kai-Uwe Bux

Juha Nieminen

12/5/2008 12:18:00 PM

0

Steve555 wrote:
> The structs are always stored in the vector in ascending order of
> movieID.
>
> I need to find a quick way to check if the vector contains a specific
> movieID.

std::binary_search()? (Or if you want to get hold of the found
element, std::lower_bound().)

joseph cook

12/5/2008 1:52:00 PM

0

On Dec 5, 4:09 am, Steve555 <foursh...@btinternet.com> wrote:
> Hi

> Unfortunately, using maps instead of vectors means each record is 32
> bytes * 100+million = 3Gb

You might also consider looking at the functions make_heap(), and the
related ones.

Joe

peter koch

12/5/2008 7:15:00 PM

0

On 5 Dec., 10:09, Steve555 <foursh...@btinternet.com> wrote:
> Hi
>
> My vector (   typedef   vector<movieRatingPair*> ratingsV    )  contains

Why do you have a pointer of vectors? You probably should have a
vector of MovieRatingPair.

> this struct:
>
> typedef struct{
>         unsigned short  movieID;
>         unsigned short  rating;
>
> } movieRatingPair;

Why do you use a C-notation? In C++ the preferred notation is
struct movieRatingPair
{
...
};
>
> ...and on average will contain 200 such structs;
> The structs are always stored in the vector in ascending order of
> movieID.
>
> I need to find a quick way to check if the vector contains a specific
> movieID. I can not use a map instead of a vector, because the extra
> few bytes they use creates a problem.(*)
>
> Simply iterating through the vector is proving a bit slow; I guess I
> can write some binary search code, but just wanted to check I wasn't
> re-inventing the wheel, or simply missing a much better way of dealing
> with the whole thing.
There is std::binary_search, std::lower_bound and std::upper_bound for
that. Using that, having much better cache-coherency when searching
values instead of pointers and using far less memory (when not newing
your elements) should combine to give you a rather huge speed-up (I
would estimate a factor of more than ten).

/Peter

Steve555

12/5/2008 8:39:00 PM

0

On 5 Dec, 12:18, Juha Nieminen <nos...@thanks.invalid> wrote:
> Steve555 wrote:
> > The structs are always stored in the vector in ascending order of
> > movieID.
>
> > I need to find a quick way to check if the vector contains a specific
> > movieID.
>
>   std::binary_search()? (Or if you want to get hold of the found
> element, std::lower_bound().)

Thanks Juha, but as I said to Kai, I can't find any documentation on
how to use std::binary_search() on a vector that contains a struct.

Steve555

12/5/2008 8:45:00 PM

0

On 5 Dec, 19:15, peter koch <peter.koch.lar...@gmail.com> wrote:
> On 5 Dec., 10:09, Steve555 <foursh...@btinternet.com> wrote:
>
> > Hi
>
> > My vector (   typedef   vector<movieRatingPair*> ratingsV    )  contains
>
> Why do you have a pointer of vectors? You probably should have a
> vector of MovieRatingPair.
>
> > this struct:
>
> > typedef struct{
> >         unsigned short  movieID;
> >         unsigned short  rating;
>
> > } movieRatingPair;
>
> Why do you use a C-notation? In C++ the preferred notation is
> struct movieRatingPair
> {
>   ...};
>
> > ...and on average will contain 200 such structs;
> > The structs are always stored in the vector in ascending order of
> > movieID.
>
> > I need to find a quick way to check if the vector contains a specific
> > movieID. I can not use a map instead of a vector, because the extra
> > few bytes they use creates a problem.(*)
>
> > Simply iterating through the vector is proving a bit slow; I guess I
> > can write some binary search code, but just wanted to check I wasn't
> > re-inventing the wheel, or simply missing a much better way of dealing
> > with the whole thing.
>
> There is std::binary_search, std::lower_bound and std::upper_bound for
> that. Using that, having much better cache-coherency when searching
> values instead of pointers and using far less memory (when not newing
> your elements) should combine to give you a rather huge speed-up (I
> would estimate a factor of more than ten).
>
> /Peter

Thanks, I like your estimate! Unfortunately I've got really stuck on
figuring out how to use binary_search
with a vector that contains structs. Any hints anyone can give me
using my data types would be greatly appreciated.

Steve555

12/5/2008 8:47:00 PM

0

On 5 Dec, 09:31, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> Steve555 wrote:
> > Hi
>
> > My vector (   typedef vector<movieRatingPair*> ratingsV    )  contains
> > this struct:
>
> > typedef struct{
> > unsigned short        movieID;
> > unsigned short        rating;
> > } movieRatingPair;
>
> a) You can ditch the typedef:
>
>   struct movieRatingPair {
>     unsigned short movieID;
>     unsigned short rating;
>   };
>
> will do.
>
> b) Is there a reason to use vector<movieRatingPair*> instead of
> vector<movieRatingPair>? Note that dynamic allocation of objects usually
> has some overhead: (1) the pointer (2) some size information about the
> allocated object. You might end up using 12 bytes (4 in the pointer and 8
> in the pointee) instead of 4 (which is what two unsigned shorts might give
> you, but that is platform specific).
>
> > ...and on average will contain 200 such structs;
> > The structs are always stored in the vector in ascending order of
> > movieID.
>
> > I need to find a quick way to check if the vector contains a specific
> > movieID. I can not use a map instead of a vector, because the extra
> > few bytes they use creates a problem.(*)
>
> > Simply iterating through the vector is proving a bit slow; I guess I
> > can write some binary search code, but just wanted to check I wasn't
> > re-inventing the wheel, or simply missing a much better way of dealing
> > with the whole thing.
>
> Provide a functor to compare movieRatingPair (or movieRatingPair*) by the
> movieID. Then, sort the vector using std::sort() and then test for entries
> with std::binary_search(). You can also find iterators to the entries with
> std::lower_bound() and std::upper_bound().
>
> [snip]
>
> Best
>
> Kai-Uwe Bux

Thanks Kai but I'm completely lost with getting binary_search to work
with a vector containing structs. They are already sorted upon
creation, so I didn't understand how re-sorting them would help.

peter koch

12/5/2008 8:58:00 PM

0

On 5 Dec., 21:44, Steve555 <foursh...@btinternet.com> wrote:
> On 5 Dec, 19:15, peter koch <peter.koch.lar...@gmail.com> wrote:
>
>
>
>
>
> > On 5 Dec., 10:09, Steve555 <foursh...@btinternet.com> wrote:
>
> > > Hi
>
> > > My vector (   typedef   vector<movieRatingPair*> ratingsV    )  contains
>
> > Why do you have a pointer of vectors? You probably should have a
> > vector of MovieRatingPair.
>
> > > this struct:
>
> > > typedef struct{
> > >         unsigned short  movieID;
> > >         unsigned short  rating;
>
> > > } movieRatingPair;
>
> > Why do you use a C-notation? In C++ the preferred notation is
> > struct movieRatingPair
> > {
> >   ...};
>
> > > ...and on average will contain 200 such structs;
> > > The structs are always stored in the vector in ascending order of
> > > movieID.
>
> > > I need to find a quick way to check if the vector contains a specific
> > > movieID. I can not use a map instead of a vector, because the extra
> > > few bytes they use creates a problem.(*)
>
> > > Simply iterating through the vector is proving a bit slow; I guess I
> > > can write some binary search code, but just wanted to check I wasn't
> > > re-inventing the wheel, or simply missing a much better way of dealing
> > > with the whole thing.
>
> > There is std::binary_search, std::lower_bound and std::upper_bound for
> > that. Using that, having much better cache-coherency when searching
> > values instead of pointers and using far less memory (when not newing
> > your elements) should combine to give you a rather huge speed-up (I
> > would estimate a factor of more than ten).
>
> > /Peter
>
> Thanks, I like your estimate! Unfortunately I've got really stuck on
> figuring out how to use binary_search
> with a vector that contains structs. Any hints anyone can give me
> using my data types would be greatly appreciated.

One way would be to make your structs comparable, but to me this seems
wrong as you seemingly only want to compare part of the struct.

So write a function or a functor that compares the movieID's of two
movieRatingPairs. The functor way would be:

struct lessmovieIDs
{
bool operator()(movieRatingPair const& lhs,movieRatingPair const&
rhs)
// returns true iff the movieID of lhs is less than the movieID of
rhs
{
...
}
};

/Peter

Kai-Uwe Bux

12/5/2008 9:03:00 PM

0

Steve555 wrote:

> On 5 Dec, 09:31, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
>> Steve555 wrote:
>> > Hi
>>
>> > My vector (   typedef vector<movieRatingPair*> ratingsV    )  contains
>> > this struct:
>>
>> > typedef struct{
>> > unsigned short        movieID;
>> > unsigned short        rating;
>> > } movieRatingPair;
>>
>> a) You can ditch the typedef:
>>
>> struct movieRatingPair {
>> unsigned short movieID;
>> unsigned short rating;
>> };
>>
>> will do.
>>
>> b) Is there a reason to use vector<movieRatingPair*> instead of
>> vector<movieRatingPair>? Note that dynamic allocation of objects usually
>> has some overhead: (1) the pointer (2) some size information about the
>> allocated object. You might end up using 12 bytes (4 in the pointer and 8
>> in the pointee) instead of 4 (which is what two unsigned shorts might
>> give you, but that is platform specific).
>>
>> > ...and on average will contain 200 such structs;
>> > The structs are always stored in the vector in ascending order of
>> > movieID.
>>
>> > I need to find a quick way to check if the vector contains a specific
>> > movieID. I can not use a map instead of a vector, because the extra
>> > few bytes they use creates a problem.(*)
>>
>> > Simply iterating through the vector is proving a bit slow; I guess I
>> > can write some binary search code, but just wanted to check I wasn't
>> > re-inventing the wheel, or simply missing a much better way of dealing
>> > with the whole thing.
>>
>> Provide a functor to compare movieRatingPair (or movieRatingPair*) by the
>> movieID. Then, sort the vector using std::sort() and then test for
>> entries with std::binary_search(). You can also find iterators to the
>> entries with std::lower_bound() and std::upper_bound().
>>
>> [snip]
>>
>> Best
>>
>> Kai-Uwe Bux
>
> Thanks Kai but I'm completely lost with getting binary_search to work
> with a vector containing structs. They are already sorted upon
> creation, so I didn't understand how re-sorting them would help.

It wouldn't. I missed the part about the vector being sorted already.

Here is a bit of code that might help you:

#include <vector>
#include <algorithm>
#include <cassert>

struct rating {

unsigned short ID;
unsigned short value;

};

rating make_rating ( unsigned short id, unsigned short val ) {
rating result;
result.ID = id;
result.value = val;
return ( result );
}

bool compare_by_ID ( rating lhs, rating rhs ) {
return ( lhs.ID < rhs.ID );
}

int main ( void ) {
std::vector< rating > r;
r.push_back( make_rating( 1, 23 ) );
r.push_back( make_rating( 13, 10 ) );
r.push_back( make_rating( 24, 1 ) );
assert( std::binary_search( r.begin(), r.end(),
make_rating( 13, 0 ), &compare_by_ID ) );
assert( ! std::binary_search( r.begin(), r.end(),
make_rating( 10, 0 ), &compare_by_ID ) );
}


Best

Kai-Uwe Bux

Jeff Schwab

12/5/2008 9:14:00 PM

0

Steve555 wrote:
> On 5 Dec, 12:18, Juha Nieminen <nos...@thanks.invalid> wrote:
>> Steve555 wrote:
>>> The structs are always stored in the vector in ascending order of
>>> movieID.
>>> I need to find a quick way to check if the vector contains a specific
>>> movieID.
>> std::binary_search()? (Or if you want to get hold of the found
>> element, std::lower_bound().)
>
> Thanks Juha, but as I said to Kai, I can't find any documentation on
> how to use std::binary_search() on a vector that contains a struct.

It's no different from a vector that contains primitive types, once you
define a comparison operator.

/* Compares values by movie_id. Ratings are ignored. */
bool operator<(movie_rating_pair lhs, movie_rating_pair rhs) {
return lhs.movie_id < rhs.movie_id;
}

...

movie_rating_pair sought = { 837, 3 };
if (std::binary_search(pairs.begin(), pairs.end(), sought)) {
...
}