[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Python Memory Manager

Pie Squared

2/17/2008 6:33:00 PM

I've been looking at the Python source code recently, more
specifically trying to figure out how it's garbage collector works.

I've gathered that it uses refcounting as well as some cycle-detection
algorithms, but I haven't been able to figure out some other things.

Does Python actually have a single 'heap' where all the data is
stored? Because PyObject_HEAD seemed to imply to me it was just a
linked list of objects, although perhaps I didnt understand this
correctly.

Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas. (Also, if you know any
resources for things like this, I'd be grateful for links/names)

Thanks in advance,

Pie Squared
22 Answers

Paul Rubin

2/17/2008 6:57:00 PM

0

Pie Squared <PieSquared@gmail.com> writes:
> Also, if it does, how does it deal with memory segmentation? This
> question bothers me because I've been trying to implement a moving
> garbage collector, and am not sure how to deal with updating all
> program pointers to objects on the heap, and thought perhaps an answer
> to this question would give me some ideas.

As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.

> (Also, if you know any
> resources for things like this, I'd be grateful for links/names)

If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.

Pie Squared

2/17/2008 8:00:00 PM

0

On Feb 17, 1:57 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Pie Squared <PieSqua...@gmail.com> writes:
> > Also, if it does, how does it deal with memory segmentation? This
> > question bothers me because I've been trying to implement a moving
> > garbage collector, and am not sure how to deal with updating all
> > program pointers to objects on the heap, and thought perhaps an answer
> > to this question would give me some ideas.
>
> As I understand it, Python primarily uses reference counting, with a
> mark and sweep scheme for cycle breaking tacked on as an afterthought.
> It doesn't move objects in memory during GC so you can get
> fragmentation. It's probably not feasible to change this in CPython
> without extensive rewriting of CPython and maybe a lot of external C
> modules.
>
> > (Also, if you know any
> > resources for things like this, I'd be grateful for links/names)
>
> If you mean about GC in general, the old book by Jones and Lins is
> still standard, I think.

Thanks for the quick reply!

That answered my question, and I'll check out the book you're
referring to - it's exactly what I need, I think. Again, thanks!

Martin Rinehart

2/17/2008 8:05:00 PM

0

I researched this for some Java I wrote. Try to avoid shuffling
physical memory - you'll write a lot less code and it will be faster,
too.

Use an "allocated" list and an "available" list. Keep them in address
order. Inserting (moving list elements from insertion point to end)
and deleting (vice-versa) are near-zero cost operations on Intel
boxes. ( Two millis to move a million ints at 1GHz 5 years ago when I
wrote http://www.martinrinehart.com/articles... - probably half
that today.)

The worst choice is the "best fit" allocation algorithm. (Grabbing
most of a free bit leaves a probably useless small bit. Grab from the
first big piece you find.) Circular first-fit is probably best.
(Testing was for compiler-type applications.)

When space is freed, insert link in ordered chain of free space
blocks. Then combine with prev/next blocks if they are free.

And when you find the function in Python that matches Java's
System.arraycopy(), please tell me what it's called. I'm sure there
must be one.

Steve Holden

2/17/2008 8:12:00 PM

0

Paul Rubin wrote:
> Pie Squared <PieSquared@gmail.com> writes:
>> Also, if it does, how does it deal with memory segmentation? This
>> question bothers me because I've been trying to implement a moving
>> garbage collector, and am not sure how to deal with updating all
>> program pointers to objects on the heap, and thought perhaps an answer
>> to this question would give me some ideas.
>
> As I understand it, Python primarily uses reference counting, with a
> mark and sweep scheme for cycle breaking tacked on as an afterthought.
> It doesn't move objects in memory during GC so you can get
> fragmentation. It's probably not feasible to change this in CPython
> without extensive rewriting of CPython and maybe a lot of external C
> modules.
>
>> (Also, if you know any
>> resources for things like this, I'd be grateful for links/names)
>
> If you mean about GC in general, the old book by Jones and Lins is
> still standard, I think.

You also need to be aware that there are certain allocation classes
which use arenas, areas of storage dedicated to specific object types.
When those objects are destroyed their space is returned to the arena,
but the arena is either never returned to the free pool or only returned
to it when the last allocated item is collected.

There seem to be more of those kind of tricks coming up in 2.6 and 3.0.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

Pie Squared

2/17/2008 9:01:00 PM

0

On Feb 17, 3:05 pm, MartinRineh...@gmail.com wrote:
> I researched this for some Java I wrote. Try to avoid shuffling
> physical memory - you'll write a lot less code and it will be faster,
> too.
>
> Use an "allocated" list and an "available" list. Keep them in address
> order. Inserting (moving list elements from insertion point to end)
> and deleting (vice-versa) are near-zero cost operations on Intel
> boxes. ( Two millis to move a million ints at 1GHz 5 years ago when I
> wrotehttp://www.martinrinehart.com/articles/... probably half
> that today.)

It seems to me that another, perhaps better strategy, would be to
allocate a large heap space, then store a pointer to the base of the
heap, the current heap size, and the beginning of the free memory.
When you need to 'allocate' more room, just return a pointer to some
location in the heap and increment the start-of-free-memory pointer.
That way, allocation really IS free, more or less. Wouldn't that be
more efficient? Perhaps I'm missing something.

As a side note, I'm new to Usenet, so I'm not exactly sure... are
'tangents' like this - since this IS a Python newsgroup, after all -
okay?

Anyway, thanks for the suggestion.

Christian Heimes

2/17/2008 9:04:00 PM

0

Pie Squared wrote:
> I've been looking at the Python source code recently, more
> specifically trying to figure out how it's garbage collector works.
>
> I've gathered that it uses refcounting as well as some cycle-detection
> algorithms, but I haven't been able to figure out some other things.

Python uses ref counting for all objects and an additional GC for
container objects like lists and dicts. The cyclic GC depends on ref
counting.

> Does Python actually have a single 'heap' where all the data is
> stored? Because PyObject_HEAD seemed to imply to me it was just a
> linked list of objects, although perhaps I didnt understand this
> correctly.

In release builds PyObject_HEAD only contains the ref count and a link
to the object type. In Py_DEBUG builds it also contains a double linked
list of all allocated objects to debug reference counting bugs.

Python uses its own optimized memory allocator. Be sure you have read
Object/obmalloc.c!

> Also, if it does, how does it deal with memory segmentation? This
> question bothers me because I've been trying to implement a moving
> garbage collector, and am not sure how to deal with updating all
> program pointers to objects on the heap, and thought perhaps an answer
> to this question would give me some ideas. (Also, if you know any
> resources for things like this, I'd be grateful for links/names)

I don't think it's possible to implement a moving GC. You'd have to
chance some fundamental parts of Python.

Christian

Paul Rubin

2/17/2008 9:25:00 PM

0

Pie Squared <PieSquared@gmail.com> writes:
> It seems to me that another, perhaps better strategy, would be to
> allocate a large heap space, then store a pointer to the base of the
> heap, the current heap size, and the beginning of the free memory.
> When you need to 'allocate' more room, just return a pointer to some
> location in the heap and increment the start-of-free-memory pointer.
> That way, allocation really IS free, more or less. Wouldn't that be
> more efficient? Perhaps I'm missing something.

The problem here is with a high allocation rate, you have to GC a lot
more often, which typically involves copying live data. So now you
have to figure out how to reduce the amount of copying using
generational schemes, (these days) come up with ways to make all this
work in the presence of parallel threads, etc. It sounds like you're
new to the subject, so for now I'll summarize the situation by saying
Python uses a fairly simple scheme that's a reasonable match for its
implementation as a small, medium performance interpreter; however,
higher performance GC'd language implementations end up doing much
more complicated things. See the Jones and Lins book that I
mentioned, and there's an even older one by Appel, and the Wikipedia
article on garbage collection may have some links you can look at.
Also, Structure and Interpretation of Computer Programs (full text
online) includes a simple implementation of a copying GC, that might
be more tutorial.

> As a side note, I'm new to Usenet, so I'm not exactly sure... are
> 'tangents' like this - since this IS a Python newsgroup, after all - okay?

It's reasonably on topic, compared with last week's long digression
about the dimensional analysis of the Kessel Run.

thebjorn

2/17/2008 9:26:00 PM

0

On Feb 17, 10:01 pm, Pie Squared <PieSqua...@gmail.com> wrote:
[...]
> It seems to me that another, perhaps better strategy, would be to
> allocate a large heap space, then store a pointer to the base of the
> heap, the current heap size, and the beginning of the free memory.
> When you need to 'allocate' more room, just return a pointer to some
> location in the heap and increment the start-of-free-memory pointer.
> That way, allocation really IS free, more or less. Wouldn't that be
> more efficient? Perhaps I'm missing something.

Deallocation?

> As a side note, I'm new to Usenet, so I'm not exactly sure... are
> 'tangents' like this - since this IS a Python newsgroup, after all -
> okay?

It varies depending on the group, c.l.py is pretty tolerant as long as
it's interesting ;-)

To bring it back to Python, I was under the impression that the GC was
a generational collector and not a simple mark-sweep, but it's been a
while since I read about it...

-- bjorn

Martin v. Loewis

2/17/2008 9:44:00 PM

0

>> Also, if it does, how does it deal with memory segmentation? This
>> question bothers me because I've been trying to implement a moving
>> garbage collector, and am not sure how to deal with updating all
>> program pointers to objects on the heap, and thought perhaps an answer
>> to this question would give me some ideas.
>
> As I understand it, Python primarily uses reference counting, with a
> mark and sweep scheme for cycle breaking tacked on as an afterthought.

That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
scheme that allows incremental collection without write barriers. This
particular scheme heavily relies on refcounting itself (specifically,
an object is garbage in a certain generation when all references to
it come from the same generation).

As for the consequences of the scheme (i.e. no compaction), you are
right.

Regards,
Martin

Paul Rubin

2/17/2008 9:51:00 PM

0

"Martin v. Löwis" <martin@v.loewis.de> writes:
> That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
> scheme that allows incremental collection without write barriers. This
> particular scheme heavily relies on refcounting itself (specifically,
> an object is garbage in a certain generation when all references to
> it come from the same generation).

Ah, thanks. I made another post which I guess is also somewhat wrong.