Asp Forum - memory usage, temporary and otherwise

mk

3/3/2010 6:47:00 PM

Obviously, don't try this on low-memory machine:

>>> a={}
>>> for i in range(10000000):
.... a[i]='spam'*10
....
>>> import sys
>>> sys.getsizeof(a)
201326728
>>> id(a[1])
3085643936L
>>> id(a[100])
3085713568L

>>> ids={}
>>> for i in range(len(a)):
.... ids[id(a[i])]=True
....
>>> len(ids.keys())
10000000

Hm, apparently Python didn't spot that 'spam'*10 in a's values is really
the same string, right?

So sys.getsizeof returns some 200MB for this dictionary. But according
to top RSS of the python process is 300MB. ps auxw says the same thing
(more or less).

Why the 50% overhead? (and I would swear that a couple of times RSS
according to top grew to 800MB).

Regards,
mk

7 Answers

Bruno Desthuilliers

3/3/2010 9:46:00 PM

mk a écrit :
>
> Obviously, don't try this on low-memory machine:
>
>>>> a={}
>>>> for i in range(10000000):

Note that in Python 2, this will build a list of 10000000 int objects.
You may want to use xrange instead...

> ... a[i]='spam'*10
> ...
>>>> import sys
>>>> sys.getsizeof(a)
> 201326728
>>>> id(a[1])
> 3085643936L
>>>> id(a[100])
> 3085713568L
>
>>>> ids={}
>>>> for i in range(len(a)):

And this build yet another list of 10000000 int objects.

> ... ids[id(a[i])]=True
> ...
>>>> len(ids.keys())
> 10000000
>
> Hm, apparently Python didn't spot that 'spam'*10 in a's values is really
> the same string, right?

Seems not. FWIW, Python does some caching on some values of some
immutable types (small ints, some strings etc), but this is
implementation dependant so you shouldn't rely on it.

> So sys.getsizeof returns some 200MB for this dictionary. But according
> to top RSS of the python process is 300MB. ps auxw says the same thing
> (more or less).
>
> Why the 50% overhead? (and I would swear that a couple of times RSS
> according to top grew to 800MB).

(overly simplified)

When an object is garbage-collected, the memory is not necessarily
"returned" to the system - and the system doesn't necessarily claim it
back neither until it _really_ needs it.

This avoid a _lot_ of possibly useless work for both the python
interpreter (keeping already allocated memory costs less than immediatly
returning it, just to try and allocate some more memory a couple
instructions later) and the system (ditto - FWIW, how linux handles
memory allocations is somewhat funny, if you ever programmed in C).

HTH

Bruno Desthuilliers

3/3/2010 9:48:00 PM

Bruno Desthuilliers a écrit :
> mk a écrit :
(snip)
>> So sys.getsizeof returns some 200MB for this dictionary. But according
>> to top RSS of the python process is 300MB. ps auxw says the same thing
>> (more or less).
>>
>> Why the 50% overhead?

Oh, and yes - the interpreter itself, the builtins, and all imported
modules also eat some space...

(snip)

mk

3/4/2010 11:56:00 AM

Bruno Desthuilliers wrote:
> mk a écrit :
>> Obviously, don't try this on low-memory machine:
>>
>>>>> a={}
>>>>> for i in range(10000000):

> Note that in Python 2, this will build a list of 10000000 int objects.
> You may want to use xrange instead...

Huh? I was under impression that some time after 2.0 range was made to
work "under the covers" like xrange when used in a loop? Or is it 3.0
that does that?

> And this build yet another list of 10000000 int objects.

Well this explains much of the overhead.

> (overly simplified)
>
> When an object is garbage-collected, the memory is not necessarily
> "returned" to the system - and the system doesn't necessarily claim it
> back neither until it _really_ needs it.
>
> This avoid a _lot_ of possibly useless work for both the python
> interpreter (keeping already allocated memory costs less than immediatly
> returning it, just to try and allocate some more memory a couple
> instructions later) and the system (ditto - FWIW, how linux handles
> memory allocations is somewhat funny, if you ever programmed in C).

Ah! That explains a lot. Thanks to you, I have again expanded my
knowledge of Python!

Hmm I would definitely like to read smth on how CPython handles memory
on Python wiki. Thanks for that doc on wiki on "functions & methods" to
you and John Posner, I'm reading it every day like a bible. ;-)

Regards,
mk

Duncan Booth

3/4/2010 12:25:00 PM

mk <mrkafk@gmail.com> wrote:

> Hm, apparently Python didn't spot that 'spam'*10 in a's values is really
> the same string, right?

If you want it to spot that then give it a hint that it should be looking
for identical strings:

>>> a={}
>>> for i in range(10000000):
.... a[i]=intern('spam'*10)

should reduce your memory use somewhat.

--
Duncan Booth http://kupuguy.bl...

lbolla

3/4/2010 5:30:00 PM

On Mar 4, 12:24 pm, Duncan Booth <duncan.bo...@invalid.invalid> wrote:
>
> >>> a={}
> >>> for i in range(10000000):
> ... a[i]=intern('spam'*10)
>

"intern": another name borrowed from Lisp?

Terry Reedy

3/4/2010 6:03:00 PM

On 3/4/2010 6:56 AM, mk wrote:
> Bruno Desthuilliers wrote:

> Huh? I was under impression that some time after 2.0 range was made to
> work "under the covers" like xrange when used in a loop? Or is it 3.0
> that does that?

3.0.

Steve Holden

3/4/2010 6:08:00 PM

Duncan Booth wrote:
> mk <mrkafk@gmail.com> wrote:
>
>> Hm, apparently Python didn't spot that 'spam'*10 in a's values is really
>> the same string, right?
>
> If you want it to spot that then give it a hint that it should be looking
> for identical strings:
>
> >>> a={}
> >>> for i in range(10000000):
> ... a[i]=intern('spam'*10)
>
> should reduce your memory use somewhat.
>
Better still, hoist the constant value out of the loop:

>>> a={}
>>> const = 'spam'*10
>>> for i in range(10000000):
... a[i] = const

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us....
Holden Web LLC http://www.hold...
UPCOMING EVENTS: http://holdenweb.event...

comp.lang.python

memory usage, temporary and otherwise

mk

Bruno Desthuilliers

Bruno Desthuilliers

mk

Duncan Booth

lbolla

Terry Reedy

Steve Holden

x Login to ForumsZone