[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.c++

Overhead of subscript operator for STL maps

C++Liliput

10/16/2008 5:54:00 AM

I have a custom String class that contains an embedded char* member.
The copy constructor, assignment operator etc. are all correctly
defined. I need to create a map of my string (say a class called
MyString) and an integer i.e. std::map<MyString, int>. Whenever I
insert the elements in the map using the subscript [] operator, I
noticed that the copy constructor for MyString is invoked more number
of times than if I do it using the insert() operation. For e.g. for 3
strings, when I insert them into the map using subscript operator, the
copy constructor is invoked 6 times whereas in the case of insert(),
the copy constructor is only invoked 3 times. Can someone please
explain me the reason why this is so?
20 Answers

Jerry

10/16/2008 6:38:00 AM

0

On Oct 16, 1:53 pm, "C++Liliput" <aveekmi...@gmail.com> wrote:
> I have a custom String class that contains an embedded char* member.
> The copy constructor, assignment operator etc. are all correctly
> defined. I need to create a map of my string (say a class called
> MyString) and an integer i.e. std::map<MyString, int>. Whenever I
> insert the elements in the map using the subscript [] operator, I
> noticed that the copy constructor for MyString is invoked more number
> of times than if I do it using the insert() operation. For e.g. for 3
> strings, when I insert them into the map using subscript operator, the
> copy constructor is invoked 6 times whereas in the case of insert(),
> the copy constructor is only invoked 3 times. Can someone please
> explain me the reason why this is so?

Where do you call the operator[]? I think paste your codes could make
the question clear.

Ian Collins

10/16/2008 7:32:00 AM

0

C++Liliput wrote:
> I have a custom String class that contains an embedded char* member.
> The copy constructor, assignment operator etc. are all correctly
> defined. I need to create a map of my string (say a class called
> MyString) and an integer i.e. std::map<MyString, int>. Whenever I
> insert the elements in the map using the subscript [] operator, I
> noticed that the copy constructor for MyString is invoked more number
> of times than if I do it using the insert() operation. For e.g. for 3
> strings, when I insert them into the map using subscript operator, the
> copy constructor is invoked 6 times whereas in the case of insert(),
> the copy constructor is only invoked 3 times. Can someone please
> explain me the reason why this is so?

The subscript operator creates a default constructed object then copies
the passed object to it. So you will see a copy for the operator call
and one for the copy in.

--
Ian Collins

Maxim Yegorushkin

10/16/2008 11:12:00 AM

0

On Oct 16, 8:32 am, Ian Collins <ian-n...@hotmail.com> wrote:
> C++Liliput wrote:
> > I have a custom String class that contains an embedded char* member.
> > The copy constructor, assignment operator etc. are all correctly
> > defined. I need to create a map of my string (say a class called
> > MyString) and an integer i.e. std::map<MyString, int>. Whenever I
> > insert the elements in the map using the subscript [] operator, I
> > noticed that the copy constructor for MyString is invoked more number
> > of times than if I do it using the insert() operation. For e.g. for 3
> > strings, when I insert them into the map using subscript operator, the
> > copy constructor is invoked 6 times whereas in the case of insert(),
> > the copy constructor is only invoked 3 times. Can someone please
> > explain me the reason why this is so?
>
> The subscript operator creates a default constructed object then copies
> the passed object to it.  So you will see a copy for the operator call
> and one for the copy in.

The subscript operator creates a default constructed object only for
the *value*, but not for the key of std::map<>.

In fact, std::map<Key, Value> internally stores std::pair<Key, Value>.
When std::map<>::operator[] is called and there has been no such key,
it creates a temporary std::pair<Key, Value> (where Value is default
initialised) and then copies that temporary std::pair<Key, Value> into
the newly created (tree) node, thus 2 calls to the copy constructor of
Key. std::map::insert(), on the other hand, accepts std::pair<Key,
Value>, which is used to initialise the node's copy of it, thus 1 copy
constructor call.

--
Max

Stephen Horne

10/17/2008 7:43:00 AM

0

On Thu, 16 Oct 2008 20:32:13 +1300, Ian Collins <ian-news@hotmail.com>
wrote:

>The subscript operator creates a default constructed object then copies
>the passed object to it. So you will see a copy for the operator call
>and one for the copy in.

Wow!

I always assumed that using [] for a key that isn't already in the
container would throw an exception. It seems like an obvious case of
most-likely-a-mistake to me.

I realise that scripting languages do it, but they get to handle the
[] differently depending on whether its on the left side of an
assignment or not. They still complain if you try to read a
non-existent key.

And even with the [] on the left of an assignment, I still think it's
a bit bad, since to me the obvious intent is to overwrite the data for
an existing key.

Sorry - obviously the rules are what they are - I'm just a bit
shocked.

Erik Wikström

10/17/2008 3:28:00 PM

0

On 2008-10-17 09:43, Stephen Horne wrote:
> On Thu, 16 Oct 2008 20:32:13 +1300, Ian Collins <ian-news@hotmail.com>
> wrote:
>
>>The subscript operator creates a default constructed object then copies
>>the passed object to it. So you will see a copy for the operator call
>>and one for the copy in.
>
> Wow!
>
> I always assumed that using [] for a key that isn't already in the
> container would throw an exception. It seems like an obvious case of
> most-likely-a-mistake to me.
>
> I realise that scripting languages do it, but they get to handle the
> [] differently depending on whether its on the left side of an
> assignment or not. They still complain if you try to read a
> non-existent key.
>
> And even with the [] on the left of an assignment, I still think it's
> a bit bad, since to me the obvious intent is to overwrite the data for
> an existing key.

Personally I consider it a good idea, it can simplify a lot of code
(such as this one, written by Fred Swartz):

#include <iostream>
#include <map>
#include <string>
using namespace std;

int main() {
map<string, int> freq; // map of words and their frequencies
string word; // input buffer for words.

//--- Read words/tokens from input stream
while (cin >> word) {
freq[word]++;
}

//--- Write the count and the word.
map<string, int>::const_iterator iter;
for (iter=freq.begin(); iter != freq.end(); ++iter) {
cout << iter->second << " " << iter->first << endl;
}
return 0;
}//end main

Notice how little code is dedicated to the actual counting of the words.

--
Erik Wikström

Juha Nieminen

10/17/2008 5:05:00 PM

0

Stephen Horne wrote:
> I realise that scripting languages do it, but they get to handle the
> [] differently depending on whether its on the left side of an
> assignment or not. They still complain if you try to read a
> non-existent key.

Actually in many languages (such as for example PHP) relational maps
work exactly that way. That is, you add an element to the map by
"indexing" it with the key and assigning the element. This is a very
common idiom eg. in PHP.

> And even with the [] on the left of an assignment, I still think it's
> a bit bad, since to me the obvious intent is to overwrite the data for
> an existing key.

It might be "obvious" to you because you are used to think like that.
However, in many languages (such as PHP) it's obvious that you are
building a relational map by indexing it. That is, when you say:

myMap[key] = value;

when you are doing is adding 'value' at the "position" 'key' of the map.
Or if there was already such a key, you are replacing its data with the
new value.

std::map has that exact same idea (although it's not as flexible as
PHP because the type of the key and the value are fixed). As exemplified
by Erik, this can be quite handy in some cases, for example because you
can write things like:

words[word]++;

One single command adds a new 'word' to the map and increments its value.

If you want to check if a key exists in the map, that's what the
find() member function is for.

James Kanze

10/17/2008 5:09:00 PM

0

On Oct 17, 5:27 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
> On 2008-10-17 09:43, Stephen Horne wrote:
> > On Thu, 16 Oct 2008 20:32:13 +1300, Ian Collins
> > <ian-n...@hotmail.com> wrote:

> >>The subscript operator creates a default constructed object
> >>then copies the passed object to it. So you will see a copy
> >>for the operator call and one for the copy in.

> > Wow!

> > I always assumed that using [] for a key that isn't already
> > in the container would throw an exception. It seems like an
> > obvious case of most-likely-a-mistake to me.

> > I realise that scripting languages do it, but they get to
> > handle the [] differently depending on whether its on the
> > left side of an assignment or not. They still complain if
> > you try to read a non-existent key.

Do they? The ones I use don't. (You can get this behavior in
C++, if you want it, by having operator[] return a proxy.)

> > And even with the [] on the left of an assignment, I still
> > think it's a bit bad, since to me the obvious intent is to
> > overwrite the data for an existing key.

> Personally I consider it a good idea, it can simplify a lot of
> code (such as this one, written by Fred Swartz):

Sure, it simplifies some things. And makes others more
complicated. The fundamental problem is that it means that
operator[] can't be used on a const map, which is exceedingly
constraining.

> #include <iostream>
> #include <map>
> #include <string>
> using namespace std;

> int main() {
> map<string, int> freq; // map of words and their frequencies
> string word; // input buffer for words.

> //--- Read words/tokens from input stream
> while (cin >> word) {
> freq[word]++;
> }

> //--- Write the count and the word.
> map<string, int>::const_iterator iter;
> for (iter=freq.begin(); iter != freq.end(); ++iter) {
> cout << iter->second << " " << iter->first << endl;
> }
> return 0;
> }//end main

> Notice how little code is dedicated to the actual counting of
> the words.

And consider the more common use of using an std::map to look up
a factory, dynamically. If the list of classes is fixed, the
map will typically be const. There's no problem with operator[]
returning a value constructed with the default constructor,
since the mapped_type is normally a pointer, and a null pointer
is a good signal for a missing element. But you really want the
instance to be const.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Stephen Horne

10/18/2008

0

On Fri, 17 Oct 2008 17:05:27 GMT, Juha Nieminen
<nospam@thanks.invalid> wrote:

>Stephen Horne wrote:
>> I realise that scripting languages do it, but they get to handle the
>> [] differently depending on whether its on the left side of an
>> assignment or not. They still complain if you try to read a
>> non-existent key.
>
> Actually in many languages (such as for example PHP) relational maps
>work exactly that way. That is, you add an element to the map by
>"indexing" it with the key and assigning the element. This is a very
>common idiom eg. in PHP.

I didn't know that, and on my first read of this I thought I'd
overgeneralised a principle from basically just Python to scripting
languages more generally. On second read, you are saying precisely
what I said - PHP is a scripting language, and it is doing what I said
is characteristic of scripting languages.

Maybe I worded my post awkwardly. To clarify...

C++ allows key:data insertion using "container [key] = data;". That
surprised me. I realise that scripting languages *also* do "container
[key] = data;", but they get to handle the subscripting differently.

Basically, scripting languages don't do a two stage process of insert
a key:default in the [], then overwrite the default with the data in
the assign. They merge the subscripting and assign into a single step.
For example, Python has separate magic methods for overriding a
subscripted read and a subscripted write. When C++ handles the
subscripting operation it doesn't even know that there is an assign.
For example...

std::map<int,int> mymap;
int var;

var = mymap [1];

Did the programmer really intend for var to get a random junk value
that was also assigned into mymap in a key=1:data=junk pair? Python
would know better since the assignment is on the wrong side of the
assignment, I assume PHP would as well, but C++ cannot since when it
handles the operator[], the method doesn't know how the returned
reference will be used. It returns a reference that might be used for
a read or a write, or which may not be used at all.

The two normal variants of operator[] aren't read and write - they are
const and non-const as follows...

const data_t& container::operator[] const { ... }
data_t& container::operator[] { ... }

The first shouldn't modify the container, and returns a reference that
will normally prevent the referenced value from being modified (with
various get-out clauses - mutable, const_cast, ...). This version is
called whenever the container is considered const.

The second returns a non-const reference which is equally valid for
both reading or writing the referenced data. This version is called
whenever the container is considered non-const, which is no clue as to
whether a read or write is intended.

The semantics of the first case ensure a const container is safe - you
can't modify a const container so you can't insert a key:default pair
and I assume an exception is thrown in that case.

I should say that based purely on the method signature there is no
reason to ban inserts on subscripting - the container is non-const.
Even so, I feel that reserving the subscripting operator for cases
where the intent is to access an existing key would be better.

> It might be "obvious" to you because you are used to think like that.
>However, in many languages (such as PHP) it's obvious that you are
>building a relational map by indexing it. That is, when you say:
>
> myMap[key] = value;

First off, my criticism of that is that there's already a way to
express that intent in C++ - the insert method. When you already have
a way to express that intent, why make the intent of another notation
less clear (and reduce the opportunities for error checking to catch
problems) by mixing the two intents together.

insert cannot be used to access an existing key. Why allow operator[]
the dual role of both accessing existing keys and inserting new ones?

I always prefer a different notation for a different intent (within
reason) so that errors are caught instead of remaining hidden and
leading to garbage results being trusted.

It's subjective opinion, of course, esp. in where you draw the "within
reason" line, but there's a lot more to it than just "what I'm used
to".

Secondly, it's not something I simply got used through using some
other language at all, unless it was an old pre-standard version of
C++. I can't even name a language that uses the semantics I suggest.
Ada, Modula 2, Pascal, C, Basic and Assembler don't have standard
associative containers to the best of my memory. Of the languages I've
played with, Haskell is pure functional - no mutables (except for the
monad getout clause), Prolog is wierd, and I'm dimly aware that arrays
are mutable in Objective Caml but that's my limit.

My own containers don't allow subscripting to insert new items, but
that was either based on a decision at the time to go against what I
was used to, or else it was doing what I was used to in pre-standard
C++. I honestly don't recall which.

I last used map (there was no std:: prefix at the time) in Borland C++
5, so maybe someone remembers what that did - I haven't had access to
a copy for years now.

> std::map has that exact same idea (although it's not as flexible as
>PHP because the type of the key and the value are fixed). As exemplified
>by Erik, this can be quite handy in some cases, for example because you
>can write things like:
>
> words[word]++;
>
> One single command adds a new 'word' to the map and increments its value.

You won't get the result you expect, or at least if you do it's only a
fluke.

Try the following...

#include <map>
#include <iostream>

void Garbage ()
{
char garbage [1024];
for (int i = 0; i < 1024; ++i) garbage [i] = 0x55;
}

void Test ()
{
int a;

std::cout << a << std::endl;
}

int main()
{
Garbage ();
Test ();

return 0;
};

What result do you expect from Test? If you say zero, you're wrong.
The default constructor for int is trivial - it does nothing. The
variable a is never initialised, so it's value is junk. Because of the
specific junk that happens to be on the stack at the time (due to the
prior Garbage call) the most likely output is 2009147472.

In a completely trivial test program you will fluke it - on typical
desktop systems - as the memory is normally wiped to all zero as your
program is loaded. As soon as the program does anything significant,
though...

Getting back to your example, if the key "word" is new, so you insert
a new item, you are incrementing a garbage value - whatever happened
to be in some arbitrary chunk of memory prior to executing that line.
Even that behaviour is only in practice - the standard will say that
the behaviour is undefined.

That error wouldn't happen in PHP, I'll bet, but C++ isn't a scripting
language and it works differently, which was a significant part of my
point. I only disagree with it to a small degree in scripting
languages, but there are more problems with it in C++.

Your compiler might warn you that you're using an uninitialised value,
but only in some cases. In GCC, the example I gave doesn't seem to
give a warning even with -Wall. In your example, the compiler cannot
give a warning - it cannot know that the reference returned by
operator[] refers to an uninitialised location.

> If you want to check if a key exists in the map, that's what the
>find() member function is for.

But when the needed checks aren't done by default, they often don't
get done at all. Relative to C, the trend in C++ has been toward
language features that are safer by default.

For example, we no longer have to program an explicit null check after
using new. Pre-standardisation, new was safer than malloc (it
supported initialisation by default) but could still return null if it
couldn't allocate memory. Now, we also have an exception by default.
We don't have to remember the check - it is implicit.

Notations that cleanly separate different intents support better
automatic checks - it's that simple.

Thomas J. Gritzan

10/18/2008 1:44:00 AM

0

Stephen Horne wrote:
> On Fri, 17 Oct 2008 17:05:27 GMT, Juha Nieminen
> <nospam@thanks.invalid> wrote:
>
>> Stephen Horne wrote:
>>> I realise that scripting languages do it, but they get to handle the
>>> [] differently depending on whether its on the left side of an
>>> assignment or not. They still complain if you try to read a
>>> non-existent key.
>> Actually in many languages (such as for example PHP) relational maps
>> work exactly that way. That is, you add an element to the map by
>> "indexing" it with the key and assigning the element. This is a very
>> common idiom eg. in PHP.

In Ruby it's basically the same with a dictionary. You can overload
index-get and index-assignment separatly, that is, [] and []= are two
different functions/methods. With []= you insert a new element or change
it, with [] you get nil, if the element doesn't exist.

Stephen Horne wrote:
> C++ allows key:data insertion using "container [key] = data;". That
> surprised me. I realise that scripting languages *also* do "container
> [key] = data;", but they get to handle the subscripting differently.
>
> Basically, scripting languages don't do a two stage process of insert
> a key:default in the [], then overwrite the default with the data in
> the assign. They merge the subscripting and assign into a single step.

Yes. In Ruby, the method [] is called if you want to get a value, and
[]= if you assign to it.

In C++ there's only one operator[] and it is called for read and write.
You can fake the semantics by returning a proxy object that has
operator=(const TYPE&) for assignment and operator TYPE() for reading
the value.

Stephen Horne wrote:
> Juha Nieminen wrote:
>> std::map has that exact same idea (although it's not as flexible as
>> PHP because the type of the key and the value are fixed). As exemplified
>> by Erik, this can be quite handy in some cases, for example because you
>> can write things like:
>>
>> words[word]++;
>>
>> One single command adds a new 'word' to the map and increments its value.
>
> You won't get the result you expect, or at least if you do it's only a
> fluke.

He will.

> Try the following...
>
> #include <map>
> #include <iostream>
>
> void Garbage ()
> {
> char garbage [1024];
> for (int i = 0; i < 1024; ++i) garbage [i] = 0x55;
> }

That's a completly different case. 'garbage' here is not initialized.
But std::map guarantees that new objects are default-initialized.

Is the same as with std::vector:

std::vector<int> v;
v.resize(10);

The elements don't contain garbage, they are all 0 (zero).

>
> What result do you expect from Test? If you say zero, you're wrong.
> The default constructor for int is trivial - it does nothing. The
> variable a is never initialised, so it's value is junk. Because of the
> specific junk that happens to be on the stack at the time (due to the
> prior Garbage call) the most likely output is 2009147472.
>
> In a completely trivial test program you will fluke it - on typical
> desktop systems - as the memory is normally wiped to all zero as your
> program is loaded. As soon as the program does anything significant,
> though...
>
> Getting back to your example, if the key "word" is new, so you insert
> a new item, you are incrementing a garbage value - whatever happened
> to be in some arbitrary chunk of memory prior to executing that line.
> Even that behaviour is only in practice - the standard will say that
> the behaviour is undefined.

In general it is true that primitives aren't initialized, but the
standard containers guarantee that they are.

You should read a good book to update your C++.

--
Thomas

James Kanze

10/18/2008 7:04:00 AM

0

On Oct 17, 7:05 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Stephen Horne wrote:
> > I realise that scripting languages do it, but they get to
> > handle the [] differently depending on whether its on the
> > left side of an assignment or not. They still complain if
> > you try to read a non-existent key.

> Actually in many languages (such as for example PHP)
> relational maps work exactly that way. That is, you add an
> element to the map by "indexing" it with the key and assigning
> the element. This is a very common idiom eg. in PHP.

Such languages don't have const. All of the ones I know, for
that matter, are untyped, and don't require variable
declarations either. So there is always a well defined value
which can be used for initialization.

In C++, logically, you should be able to use [] on a const map
(if you're just reading). And you should be able to use [] even
if the mapped type doesn't have a default constructor.

On the other hand, one can easily argue that operator[] isn't
part the basic interface defined by std::map. It's just an
added convenience (or even an example of operator overload
abuse:-)). Most of the time, you'll wrap std::map in a class
which provides a more convenient interface for what it's being
used for, which is application specific. The presence of
operator[] is just as a convenience, for one common use, and
this particular use was chosen precisely because it happens to
be supported in a couple of other common languages (AWK, perl,
etc.)

> > And even with the [] on the left of an assignment, I still
> > think it's a bit bad, since to me the obvious intent is to
> > overwrite the data for an existing key.

> It might be "obvious" to you because you are used to think
> like that.

I think the point of something like std::map is that it is
reallly a low level building block, with a number of (sometimes
contradicting) "obvious" uses. In this sense, operator[]
doesn't really belong, so who cares what semantics it has. (In
practice, I rarely use it, relying much more on map<>::find().)

> However, in many languages (such as PHP) it's obvious that you
> are building a relational map by indexing it. That is, when
> you say:

>     myMap[key] = value;

Except that that's not at all the way you build a map in C++
(and it's a very bad way of building it in other languages). In
C++, the idiomatic fashion would be something like:

while ( more elements ) {
if ( ! myMap.insert(
MapType::value_type(
element.key, element.value ) ).second ) {
// error handling...
}
}

Correctly used, find, insert and erase allow you to implement
whatever idiomatic use you need in a particular case.

> when you are doing is adding 'value' at the "position" 'key'
> of the map. Or if there was already such a key, you are
> replacing its data with the new value.

In other words, you might be overwriting an existing value when
you don't want to.

> std::map has that exact same idea (although it's not as
> flexible as PHP because the type of the key and the value are
> fixed). As exemplified by Erik, this can be quite handy in
> some cases,

In a very few cases, yes.

C++ and PHP have very different goals. In C++, you're supposed
to be able to write robust code. This means taking advantage of
the stricter type system, const, etc. And it generally means
more thorough error checking---in most data base use, insertion
is a distinct action from update, and when you want one,
accidentally getting the other is an error.

> for example because you can write things like:

>     words[word]++;

> One single command adds a new 'word' to the map and increments
> its value.

> If you want to check if a key exists in the map, that's what the
> find() member function is for.

And if you want to access in a way that distinguishes between
update and insert, you wrap std::map in a class that uses find
(and insert). If you want an assertion failure, or an
exception, or a returned error code for a failed update, you can
then have it.

The problem with operator[] isn't really what it does. The
problem is that it is even there, since there is no one
universally useful semantics for it.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34