Asp Forum - map and set classes implemented with a tree having dynamic order statistics

DJ Dharme

10/8/2008 9:20:00 AM

Hi,
Does anybody know about a solid implementation of a map or a set
with dynamic order statistics which allows the user to access a node
by its Index in log(N) time. I have a problem of sorting a huge data
set and showing it row by row. I have to quickly jump to a new
starting point on demand. If I use the std set class it takes so many
iterations to jump to a new location since we have to increment the
iterators until we get the correct row no.

Thanks in advance,

DJD

6 Answers

Maxim Yegorushkin

10/8/2008 12:18:00 PM

On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:

> Does anybody know about a solid implementation of a map or a set
> with dynamic order statistics which allows the user to access a node
> by its Index in log(N) time. I have a problem of sorting a huge data
> set and showing it row by row. I have to quickly jump to a new
> starting point on demand. If I use the std set class it takes so many
> iterations to jump to a new location since we have to increment the
> iterators until we get the correct row no.

Can you not use a sorted vector? Accessing elements by key is
O(lg(N)), by index is O(1) and there is no memory overhead compared to
std::set/map.

--
Max

DJ Dharme

10/8/2008 1:16:00 PM

On Oct 8, 5:17 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > Does anybody know about a solid implementation of a map or a set
> > with dynamic order statistics which allows the user to access a node
> > by its Index in log(N) time. I have a problem of sorting a huge data
> > set and showing it row by row. I have to quickly jump to a new
> > starting point on demand. If I use the std set class it takes so many
> > iterations to jump to a new location since we have to increment the
> > iterators until we get the correct row no.
>
> Can you not use a sorted vector? Accessing elements by key is
> O(lg(N)), by index is O(1) and there is no memory overhead compared to
> std::set/map.
>
> --
> Max

Thanks for the reply, yes I have tested this with vectors but failed
since I have to dynamically change (add, update, remove) the records.
The record count can grow up to few millions. And when the vector
starts to re-allocate memory the program hangs for some time. Please
note that this is a back end application that handles user
interactions. Also I have to sort the records on each update. So if I
use a vector large amount of items will be moved back and forth due to
insertions and removals.

Maxim Yegorushkin

10/8/2008 2:21:00 PM

On Oct 8, 2:16 pm, DJ Dharme <donjuandharmap...@gmail.com> wrote:
> On Oct 8, 5:17 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> wrote:
>
> > On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > Does anybody know about a solid implementation of a map or a set
> > > with dynamic order statistics which allows the user to access a node
> > > by its Index in log(N) time. I have a problem of sorting a huge data
> > > set and showing it row by row. I have to quickly jump to a new
> > > starting point on demand. If I use the std set class it takes so many
> > > iterations to jump to a new location since we have to increment the
> > > iterators until we get the correct row no.
>
> > Can you not use a sorted vector? Accessing elements by key is
> > O(lg(N)), by index is O(1) and there is no memory overhead compared to
> > std::set/map.
>
> Thanks for the reply, yes I have tested this with vectors but failed
> since I have to dynamically change (add, update, remove) the records.
> The record count can grow up to few millions. And when the vector
> starts to re-allocate memory the program hangs for some time.

[]

In this case you can use std::deque<> to avoid paying for reallocating
and moving many elements.

> Also I have to sort the records on each update. So if I
> use a vector large amount of items will be moved back and forth due to
> insertions and removals.

To insert an element in a sorted vector or deque you do lower_bound
followed by insert. No need to sort again the whole container.

--
Max

DJ Dharme

10/9/2008 4:57:00 AM

On Oct 8, 7:20 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On Oct 8, 2:16 pm, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
>
>
> > On Oct 8, 5:17 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > wrote:
>
> > > On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > > Does anybody know about a solid implementation of a map or a set
> > > > with dynamic order statistics which allows the user to access a node
> > > > by its Index in log(N) time. I have a problem of sorting a huge data
> > > > set and showing it row by row. I have to quickly jump to a new
> > > > starting point on demand. If I use the std set class it takes so many
> > > > iterations to jump to a new location since we have to increment the
> > > > iterators until we get the correct row no.
>
> > > Can you not use a sorted vector? Accessing elements by key is
> > > O(lg(N)), by index is O(1) and there is no memory overhead compared to
> > > std::set/map.
>
> > Thanks for the reply, yes I have tested this with vectors but failed
> > since I have to dynamically change (add, update, remove) the records.
> > The record count can grow up to few millions. And when the vector
> > starts to re-allocate memory the program hangs for some time.
>
> []
>
> In this case you can use std::deque<> to avoid paying for reallocating
> and moving many elements.
>
> > Also I have to sort the records on each update. So if I
> > use a vector large amount of items will be moved back and forth due to
> > insertions and removals.
>
> To insert an element in a sorted vector or deque you do lower_bound
> followed by insert. No need to sort again the whole container.
>
> --
> Max

Thanks Max, I forgot to mention that I have used a binary search to
insert items into the sorted vector. I never knew that I can use
lower_bound for this.

Just a small question, according to the documents the iterators in the
deque are getting invalidated when we insert new items to it. I am
currently having a map of iterators with the primary key of the
records as the key. This allows me to directly find the iterators to
the records without iterating through the set to find the correct
record. I am giving a sample code to make my points clear.

#include <map>
#include <set>
#include <string>

struct keycomp
{
bool operator()(const char* zLeft, const char* zRight)
{
return strcmp(zLeft, zRight) < 0;
}
};

class TableRow
{
public:
TableRow();
~TableRow();

std::string s_Key;
void* p_Data;
bool b_HasChanged;
};

class TableModel;

class TableRowComparator
{
public:
TableRowComparator(TableModel* pModel):p_Model(pModel){}
~TableRowComparator(){}

bool operator()(TableRow* pLeft, TableRow* pRight)
{
return p_Model->CompareRows(pLeft, pRight);
}

TableModel* p_Model;
};

typedef std::set<TableRow*, TableRowComparator> TABLE_ROW_SET;
typedef std::map<const char*, TABLE_ROW_SET::iterator, keycomp>
TABLE_ROW_ITR_MAP;

class TableModel
{
public:
TableModel(): o_RowComparator(this), set_TableRows(o_RowComparator)
{

}

// Compare rows according whatever the way the user wants
//
bool CompareRows(TableRow* pLeft, TableRow* pRight)
{
//Do whatever comparison here
return pLeft->p_Data - pRight->p_Data < 0;
}

// This method adds a record from the table control, if a record
exists for
// the same key, it will update it with the new data
//
void AddData(const char* zKey, void* pData)
{
TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);

TableRow* pRow = NULL;

if(itrMap != map_TableRowItrs.end)
{
TABLE_ROW_SET::iterator itrSet = itrMap->second;

pRow = *itrSet;
set_TableRows.erase(itrSet); // Erase the existing row since
// new one may be inserted to a
// different place
map_TableRowItrs.erase(itrMap);

}
else
{
pRow = new TableRow;
pRow->s_Key = zKey;
}

pRow->p_Data = pData;
pRow->b_HasChanged = true;

map_TableRowItrs.insert(pRow-
>s_Key.c_str(),set_TableRows.insert(pRow).first);
}

// This method removes a record from the table control
//
void RemoveData(const char* zKey)
{
TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);

if(itrMap != map_TableRowItrs.end)
{
TABLE_ROW_SET::iterator itrSet = itrMap->second;

TableRow* pRow = *itrSet;
set_TableRows.erase(itrSet);
map_TableRowItrs.erase(itrMap);

delete pRow;
}
}

// This method needs to access the set items by its index to show it
in a
// virtual list control on the Front-End
//
void* GetData(int iIndex)
{
// How to make this fast?, this is my question
// I am currently doing a caching mechanism which is not mentioned
here
// if I have a set with dynamic order statistics this can be taken
in Log(N) time
// instead of the N time

if(iIndex < set_TableRows.count())
{
TABLE_ROW_SET::iterator itrSet = set_TableRows.begin();
for (int iRow = 0 ; iRow < iIndex ; ++iRow, ++itrSet);

return (*itrSet)->p_Data;
}

return NULL;
}

protected:
TableRowComparator o_RowComparator;
TABLE_ROW_SET set_TableRows;
TABLE_ROW_ITR_MAP map_TableRowItrs;
};

Maxim Yegorushkin

10/9/2008 9:17:00 AM

On Oct 9, 5:56 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
> On Oct 8, 7:20 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> wrote:
>
>
>
> > On Oct 8, 2:16 pm, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > On Oct 8, 5:17 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > > wrote:
>
> > > > On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > > > Does anybody know about a solid implementation of a map or a set
> > > > > with dynamic order statistics which allows the user to access a node
> > > > > by its Index in log(N) time. I have a problem of sorting a huge data
> > > > > set and showing it row by row. I have to quickly jump to a new
> > > > > starting point on demand. If I use the std set class it takes so many
> > > > > iterations to jump to a new location since we have to increment the
> > > > > iterators until we get the correct row no.
>
> > > > Can you not use a sorted vector? Accessing elements by key is
> > > > O(lg(N)), by index is O(1) and there is no memory overhead compared to
> > > > std::set/map.
>
> > > Thanks for the reply, yes I have tested this with vectors but failed
> > > since I have to dynamically change (add, update, remove) the records.
> > > The record count can grow up to few millions. And when the vector
> > > starts to re-allocate memory the program hangs for some time.
>
> > []
>
> > In this case you can use std::deque<> to avoid paying for reallocating
> > and moving many elements.
>
> > > Also I have to sort the records on each update. So if I
> > > use a vector large amount of items will be moved back and forth due to
> > > insertions and removals.
>
> > To insert an element in a sorted vector or deque you do lower_bound
> > followed by insert. No need to sort again the whole container.
>
> > --
> > Max
>
> Thanks Max, I forgot to mention that I have used a binary search to
> insert items into the sorted vector. I never knew that I can use
> lower_bound for this.
>
> Just a small question, according to the documents the iterators in the
> deque are getting invalidated when we insert new items to it. I am
> currently having a map of iterators with the primary key of the
> records as the key. This allows me to directly find the iterators to
> the records without iterating through the set to find the correct
> record. I am giving a sample code to make my points clear.
>
> #include <map>
> #include <set>
> #include <string>
>
> struct keycomp
> {
> bool operator()(const char* zLeft, const char* zRight)
> {
> return strcmp(zLeft, zRight) < 0;
> }
>
> };
>
> class TableRow
> {
> public:
> TableRow();
> ~TableRow();
>
> std::string s_Key;
> void* p_Data;
> bool b_HasChanged;
>
> };
>
> class TableModel;
>
> class TableRowComparator
> {
> public:
> TableRowComparator(TableModel* pModel):p_Model(pModel){}
> ~TableRowComparator(){}
>
> bool operator()(TableRow* pLeft, TableRow* pRight)
> {
> return p_Model->CompareRows(pLeft, pRight);
> }
>
> TableModel* p_Model;
>
> };
>
> typedef std::set<TableRow*, TableRowComparator> TABLE_ROW_SET;
> typedef std::map<const char*, TABLE_ROW_SET::iterator, keycomp>
> TABLE_ROW_ITR_MAP;
>
> class TableModel
> {
> public:
> TableModel(): o_RowComparator(this), set_TableRows(o_RowComparator)
> {
>
> }
>
> // Compare rows according whatever the way the user wants
> //
> bool CompareRows(TableRow* pLeft, TableRow* pRight)
> {
> //Do whatever comparison here
> return pLeft->p_Data - pRight->p_Data < 0;
> }
>
> // This method adds a record from the table control, if a record
> exists for
> // the same key, it will update it with the new data
> //
> void AddData(const char* zKey, void* pData)
> {
> TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);
>
> TableRow* pRow = NULL;
>
> if(itrMap != map_TableRowItrs.end)
> {
> TABLE_ROW_SET::iterator itrSet = itrMap->second;
>
> pRow = *itrSet;
> set_TableRows.erase(itrSet); // Erase the existing row since
> // new one may be inserted to a
> // different place
> map_TableRowItrs.erase(itrMap);
>
> }
> else
> {
> pRow = new TableRow;
> pRow->s_Key = zKey;
> }
>
> pRow->p_Data = pData;
> pRow->b_HasChanged = true;
>
> map_TableRowItrs.insert(pRow->s_Key.c_str(),set_TableRows.insert(pRow).first);
>
> }
>
> // This method removes a record from the table control
> //
> void RemoveData(const char* zKey)
> {
> TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);
>
> if(itrMap != map_TableRowItrs.end)
> {
> TABLE_ROW_SET::iterator itrSet = itrMap->second;
>
> TableRow* pRow = *itrSet;
> set_TableRows.erase(itrSet);
> map_TableRowItrs.erase(itrMap);
>
> delete pRow;
> }
> }
>
> // This method needs to access the set items by its index to show it
> in a
> // virtual list control on the Front-End
> //
> void* GetData(int iIndex)
> {
> // How to make this fast?, this is my question
> // I am currently doing a caching mechanism which is not mentioned
> here
> // if I have a set with dynamic order statistics this can be taken
> in Log(N) time
> // instead of the N time
>
> if(iIndex < set_TableRows.count())
> {
> TABLE_ROW_SET::iterator itrSet = set_TableRows.begin();
> for (int iRow = 0 ; iRow < iIndex ; ++iRow, ++itrSet);
>
> return (*itrSet)->p_Data;
> }
>
> return NULL;
> }
>
> protected:
> TableRowComparator o_RowComparator;
> TABLE_ROW_SET set_TableRows;
> TABLE_ROW_ITR_MAP map_TableRowItrs;
> };

Now I see that you container actually has three indexes:
1) TableRow::s_Key.
2) TableRow::p_Data.
3) array index

As the first step I would try getting rid of index 3. Most grid
controls allow associating user data (a pointer) with a cell. You
could associate a pointer to TableRow with a row in a grid. Another
way is that you could extract the key from the GUI row and use that
instead of index.

Next you could use boost::multi_index container which allows having
multiple indexes in the same container. This way you don't need to
maintain an additional map of iterators/pointers for every additional
index. The code is simpler, more robust and less memory waste.

--
Max

--
Max

DJ Dharme

10/9/2008 1:01:00 PM

On Oct 9, 2:16 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On Oct 9, 5:56 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
>
>
> > On Oct 8, 7:20 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > wrote:
>
> > > On Oct 8, 2:16 pm, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > > On Oct 8, 5:17 pm, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
> > > > wrote:
>
> > > > > On Oct 8, 10:20 am, DJ Dharme <donjuandharmap...@gmail.com> wrote:
>
> > > > > > Does anybody know about a solid implementation of a map or a set
> > > > > > with dynamic order statistics which allows the user to access a node
> > > > > > by its Index in log(N) time. I have a problem of sorting a huge data
> > > > > > set and showing it row by row. I have to quickly jump to a new
> > > > > > starting point on demand. If I use the std set class it takes so many
> > > > > > iterations to jump to a new location since we have to increment the
> > > > > > iterators until we get the correct row no.
>
> > > > > Can you not use a sorted vector? Accessing elements by key is
> > > > > O(lg(N)), by index is O(1) and there is no memory overhead compared to
> > > > > std::set/map.
>
> > > > Thanks for the reply, yes I have tested this with vectors but failed
> > > > since I have to dynamically change (add, update, remove) the records.
> > > > The record count can grow up to few millions. And when the vector
> > > > starts to re-allocate memory the program hangs for some time.
>
> > > []
>
> > > In this case you can use std::deque<> to avoid paying for reallocating
> > > and moving many elements.
>
> > > > Also I have to sort the records on each update. So if I
> > > > use a vector large amount of items will be moved back and forth due to
> > > > insertions and removals.
>
> > > To insert an element in a sorted vector or deque you do lower_bound
> > > followed by insert. No need to sort again the whole container.
>
> > > --
> > > Max
>
> > Thanks Max, I forgot to mention that I have used a binary search to
> > insert items into the sorted vector. I never knew that I can use
> > lower_bound for this.
>
> > Just a small question, according to the documents the iterators in the
> > deque are getting invalidated when we insert new items to it. I am
> > currently having a map of iterators with the primary key of the
> > records as the key. This allows me to directly find the iterators to
> > the records without iterating through the set to find the correct
> > record. I am giving a sample code to make my points clear.
>
> > #include <map>
> > #include <set>
> > #include <string>
>
> > struct keycomp
> > {
> > bool operator()(const char* zLeft, const char* zRight)
> > {
> > return strcmp(zLeft, zRight) < 0;
> > }
>
> > };
>
> > class TableRow
> > {
> > public:
> > TableRow();
> > ~TableRow();
>
> > std::string s_Key;
> > void* p_Data;
> > bool b_HasChanged;
>
> > };
>
> > class TableModel;
>
> > class TableRowComparator
> > {
> > public:
> > TableRowComparator(TableModel* pModel):p_Model(pModel){}
> > ~TableRowComparator(){}
>
> > bool operator()(TableRow* pLeft, TableRow* pRight)
> > {
> > return p_Model->CompareRows(pLeft, pRight);
> > }
>
> > TableModel* p_Model;
>
> > };
>
> > typedef std::set<TableRow*, TableRowComparator> TABLE_ROW_SET;
> > typedef std::map<const char*, TABLE_ROW_SET::iterator, keycomp>
> > TABLE_ROW_ITR_MAP;
>
> > class TableModel
> > {
> > public:
> > TableModel(): o_RowComparator(this), set_TableRows(o_RowComparator)
> > {
>
> > }
>
> > // Compare rows according whatever the way the user wants
> > //
> > bool CompareRows(TableRow* pLeft, TableRow* pRight)
> > {
> > //Do whatever comparison here
> > return pLeft->p_Data - pRight->p_Data < 0;
> > }
>
> > // This method adds a record from the table control, if a record
> > exists for
> > // the same key, it will update it with the new data
> > //
> > void AddData(const char* zKey, void* pData)
> > {
> > TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);
>
> > TableRow* pRow = NULL;
>
> > if(itrMap != map_TableRowItrs.end)
> > {
> > TABLE_ROW_SET::iterator itrSet = itrMap->second;
>
> > pRow = *itrSet;
> > set_TableRows.erase(itrSet); // Erase the existing row since
> > // new one may be inserted to a
> > // different place
> > map_TableRowItrs.erase(itrMap);
>
> > }
> > else
> > {
> > pRow = new TableRow;
> > pRow->s_Key = zKey;
> > }
>
> > pRow->p_Data = pData;
> > pRow->b_HasChanged = true;
>
> > map_TableRowItrs.insert(pRow->s_Key.c_str(),set_TableRows.insert(pRow).first);
>
> > }
>
> > // This method removes a record from the table control
> > //
> > void RemoveData(const char* zKey)
> > {
> > TABLE_ROW_ITR_MAP::iterator itrMap = map_TableRowItrs.find(zKey);
>
> > if(itrMap != map_TableRowItrs.end)
> > {
> > TABLE_ROW_SET::iterator itrSet = itrMap->second;
>
> > TableRow* pRow = *itrSet;
> > set_TableRows.erase(itrSet);
> > map_TableRowItrs.erase(itrMap);
>
> > delete pRow;
> > }
> > }
>
> > // This method needs to access the set items by its index to show it
> > in a
> > // virtual list control on the Front-End
> > //
> > void* GetData(int iIndex)
> > {
> > // How to make this fast?, this is my question
> > // I am currently doing a caching mechanism which is not mentioned
> > here
> > // if I have a set with dynamic order statistics this can be taken
> > in Log(N) time
> > // instead of the N time
>
> > if(iIndex < set_TableRows.count())
> > {
> > TABLE_ROW_SET::iterator itrSet = set_TableRows.begin();
> > for (int iRow = 0 ; iRow < iIndex ; ++iRow, ++itrSet);
>
> > return (*itrSet)->p_Data;
> > }
>
> > return NULL;
> > }
>
> > protected:
> > TableRowComparator o_RowComparator;
> > TABLE_ROW_SET set_TableRows;
> > TABLE_ROW_ITR_MAP map_TableRowItrs;
> > };
>
> Now I see that you container actually has three indexes:
> 1) TableRow::s_Key.
> 2) TableRow::p_Data.
> 3) array index
>
> As the first step I would try getting rid of index 3. Most grid
> controls allow associating user data (a pointer) with a cell. You
> could associate a pointer to TableRow with a row in a grid. Another
> way is that you could extract the key from the GUI row and use that
> instead of index.
>
> Next you could use boost::multi_index container which allows having
> multiple indexes in the same container. This way you don't need to
> maintain an additional map of iterators/pointers for every additional
> index. The code is simpler, more robust and less memory waste.
>
> --
> Max
>
> --
> Max

Thank you very much for the guidance. But I can't go for your first
solution since I am driving a Front-End application from a Back-End.
What I get from the front end is the visible row range (ex:- 20 to
100) so that the only way I have to decide the data to be sent is by
its row index.

I think I can use the boost multi index container as a bi-directional
map (Example 4). Thanks again for the wonderful job that you are
doing.

DJD

comp.lang.c++

map and set classes implemented with a tree having dynamic order statistics

DJ Dharme

Maxim Yegorushkin

DJ Dharme

Maxim Yegorushkin

DJ Dharme

Maxim Yegorushkin

DJ Dharme

x Login to ForumsZone