[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Efficient way to break up a list into two pieces

marwie

2/21/2010 12:55:00 AM

Hello,

I recently read about augmented assignments and that (with l1, l2
being lists)

l1.extend(l2)

is more efficient than

l1 = l1 + l2

because unnecessary copy operations can be avoided. Now my question is
if there's a similar thing for breaking a list into two parts. Let's
say I want to remove from l1 everything from and including position 10
and store it in l2. Then I can write

l2 = l1[10:]
del l1[10:]

But since I'm assigning a slice the elements will be copied.
Basically, I'm looking for something like l1.pop(10,len(l1)) which
returns and removes a whole chunk of data. Is there such a thing (and
if not, why not?)

Cheers,
Martin.
16 Answers

Jonathan Gardner

2/21/2010 1:07:00 AM

0

On Sat, Feb 20, 2010 at 4:55 PM, marwie <marwie@gmx.de> wrote:
> Hello,
>
> I recently read about augmented assignments and that (with l1, l2
> being lists)
>
>    l1.extend(l2)
>
> is more efficient than
>
>    l1 = l1 + l2
>
> because unnecessary copy operations can be avoided. Now my question is
> if there's a similar thing for breaking a list into two parts. Let's
> say I want to remove from l1 everything from and including position 10
> and store it in l2. Then I can write
>
>    l2 = l1[10:]
>    del l1[10:]
>
> But since I'm assigning a slice the elements will be copied.
> Basically, I'm looking for something like l1.pop(10,len(l1)) which
> returns and removes a whole chunk of data. Is there such a thing (and
> if not, why not?)
>

The idiom is:

>>> l1, l2 = l1[:10], l1[10:]

Don't know if it's optimized or not. If it's not, it could probably
be. This is a really common idiom.

--
Jonathan Gardner
jgardner@jonathangardner.net

Steven D'Aprano

2/21/2010 1:30:00 AM

0

On Sat, 20 Feb 2010 16:55:10 -0800, marwie wrote:

> Hello,
>
> I recently read about augmented assignments and that (with l1, l2 being
> lists)
>
> l1.extend(l2)
>
> is more efficient than
>
> l1 = l1 + l2
>
> because unnecessary copy operations can be avoided. Now my question is
> if there's a similar thing for breaking a list into two parts. Let's say
> I want to remove from l1 everything from and including position 10 and
> store it in l2. Then I can write
>
> l2 = l1[10:]
> del l1[10:]
>
> But since I'm assigning a slice the elements will be copied.

Yes, list slicing can't make any sort of assumptions about what you're
going to do next, so when you take a slice, it has to copy the data.


> Basically,
> I'm looking for something like l1.pop(10,len(l1)) which returns and
> removes a whole chunk of data. Is there such a thing (and if not, why
> not?)

Such a list pop would have to copy the data too. How else could it work?

What exactly is the problem you are trying to solve?

If you are unhappy that copy-and-remove a slice from a list takes two
lines instead of one ("won't somebody think of the shortage of
newlines!!!" *wink*), then just write a helper function and call that:

def copy_and_remove(alist, slice):
tmp = alist[slice]
del alist[slice]
return tmp

l2 = copy_and_remove(l1, slice(10, None))


If you are unhappy that copying slices from lists copies data, well
that's just the way things are. Don't make the mistake of trying to
optimize your application before you actually know what bits need
optimizing.

Python lists are arrays of pointers to objects, so copying a slice is
fast: it doesn't have to copy the objects, just pointers. Deleting from
the end of the list is also quick, because you don't have to move memory,
just clear some pointers and change the length field.

Splitting such an array without copying data is, essentially, impossible.
Python lists aren't linked lists.

But as I said, most of the time copying slices doesn't matter: the time
taken is unlikely to be a serious factor in practice. If it is (and
you've profiled your code, right? you're not just guessing what you think
is fast and slow?) then there may be ways to optimize your code to avoid
needing to copy slices, but we'd need to know what you're doing with the
data.


--
Steven

Steven D'Aprano

2/21/2010 1:41:00 AM

0

On Sat, 20 Feb 2010 17:06:36 -0800, Jonathan Gardner wrote:

> On Sat, Feb 20, 2010 at 4:55 PM, marwie <marwie@gmx.de> wrote:
[...]
>>    l2 = l1[10:]
>>    del l1[10:]
>>
>> But since I'm assigning a slice the elements will be copied. Basically,
>> I'm looking for something like l1.pop(10,len(l1)) which returns and
>> removes a whole chunk of data. Is there such a thing (and if not, why
>> not?)
>>
>>
> The idiom is:
>
>>>> l1, l2 = l1[:10], l1[10:]

No, that has different behaviour to what the Original Poster wants, AND
it does two copies instead of one:

(1) copy l1[:10] and l1[10:]
(2) assign the names l1 and l2 to them
(3) if, and only if, there are no other references to the original l1, it
gets deleted (garbage collected).


What the OP is doing is quite different:

(1) copy l1[:10]
(2) assign the name l2 to it
(3) resize l1 in place to the first 10 items.


What the OP wants is:

(1) assign the name l2 to l1[:10] without copying
(2) resize l1 in place to the first 10 items without affecting l2.


--
Steven

marwie

2/21/2010 1:55:00 AM

0

On 21 Feb., 02:30, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> Python lists are arrays of pointers to objects, so copying a slice is
> fast: it doesn't have to copy the objects, just pointers. Deleting from
> the end of the list is also quick, because you don't have to move memory,
> just clear some pointers and change the length field.
>
> Splitting such an array without copying data is, essentially, impossible.
> Python lists aren't linked lists.

Well, to split a C array I would simply set l2 to point to l1[10] and
then
change the length of l1 (which I store somewhere else). No copying of
elements
needed. I would have assumed that python can do something like this
with its
internal arrays of pointers, too.

Anyway, this was more a question about coding style. I use
l1.extend(l2) or
l1 += l2 rather than l1 = l1 + l2 because it's as readable and
possibly faster.
I was simply wondering if something similar exists for splitting
lists.

marwie

2/21/2010 3:00:00 AM

0

On 21 Feb., 02:41, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> What the OP is doing is quite different:
>
> (1) copy l1[:10]
> (2) assign the name l2 to it
> (3) resize l1 in place to the first 10 items.
>
> What the OP wants is:
>
> (1) assign the name l2 to l1[:10] without copying
> (2) resize l1 in place to the first 10 items without affecting l2.

Assuming you meant l1[10:] instead of l1[:10], that's what I wanted,
yes.

Steven D'Aprano

2/21/2010 3:41:00 AM

0

On Sat, 20 Feb 2010 17:55:18 -0800, marwie wrote:

> On 21 Feb., 02:30, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> Python lists are arrays of pointers to objects, so copying a slice is
>> fast: it doesn't have to copy the objects, just pointers. Deleting from
>> the end of the list is also quick, because you don't have to move
>> memory, just clear some pointers and change the length field.
>>
>> Splitting such an array without copying data is, essentially,
>> impossible. Python lists aren't linked lists.
>
> Well, to split a C array I would simply set l2 to point to l1[10] and
> then change the length of l1 (which I store somewhere else). No copying
> of elements needed. I would have assumed that python can do something
> like this with its internal arrays of pointers, too.

Python lists aren't C arrays either.

Python lists are *objects*. Everything in Python is an object.
Consequently, Python lists have a header, which includes the len. You
don't store the length somewhere else, you store it in the object: this
makes for less complexity. You can't just point l2 at an arbitrary index
in an array and expect it to work, it needs a header.

Additionally, Python lists are over-allocated so that appends are fast. A
list of (say) 1000 items might be over-allocated to (say) 1024 items, so
that you can do 24 appends before the array is full and the array needs
to be resized. This means that, on average, Python list appending is O(1)
instead of O(N). You can't just change the length blindly, you need to
worry about the over-allocation.

I'm sympathetic to your concern: I've often felt offended that doing
something like this:

x = SomeReallyBigListOrString
for item in x[1:]:
process(item)

has to copy the entire list or string (less the first item). But
honestly, I've never found a situation where it actually mattered.


> Anyway, this was more a question about coding style. I use l1.extend(l2)
> or l1 += l2 rather than l1 = l1 + l2 because it's as readable and
> possibly faster.

I really, really hate augmented assignment for everything other than ints
and floats because you can't predict what it will do. Take this for
example:


>>> mylist = [1, 2, 3, 4]
>>> same_again = mylist
>>> mylist.append(5)
>>> same_again
[1, 2, 3, 4, 5]

mylist and same_again are the same object, so append and extend behave
predictably and simply. So is simple too:

>>> mylist = mylist + [-1, -2, -3]
>>> mylist
[1, 2, 3, 4, 5, -1, -2, -3]
>>> same_again
[1, 2, 3, 4, 5]

Because you have re-bound the name mylist to a new list, same_again
doesn't get modified. Nice.

But what about mylist += something else? Is it an in-place modification,
like extend and append, or a rebinding, like +? Who can remember? The
answer will depend on whether mylist is a builtin list, or a subclass.

And then there's this mystery:

>>> mylist = [1, 2, 3]
>>> mytuple = (mylist, None)
>>> mytuple[0] += [4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> mylist
[1, 2, 3, 4]

So in fact, += is *always* a rebinding, but *sometimes* it is an in-place
modification as well. Yuck.


--
Steven

MRAB

2/21/2010 4:38:00 AM

0

Steven D'Aprano wrote:
[snip]
> I'm sympathetic to your concern: I've often felt offended that doing
> something like this:
>
> x = SomeReallyBigListOrString
> for item in x[1:]:
> process(item)
>
> has to copy the entire list or string (less the first item). But
> honestly, I've never found a situation where it actually mattered.
>
[snip]
Could lists gain an .items() method with start and end positions? I was
thinking that it would be a 'view', like with dicts in Python 3. Of
course, that would only be worthwhile with very long lists:

x = SomeReallyBigListOrString
for item in x.items(1):
process(item)

Jonathan Gardner

2/21/2010 5:22:00 AM

0

On Sat, Feb 20, 2010 at 5:41 PM, Steven D'Aprano
<steve@remove-this-cybersource.com.au> wrote:
>
> What the OP wants is:
>
> (1) assign the name l2 to l1[:10] without copying
> (2) resize l1 in place to the first 10 items without affecting l2.
>

For ten items, though, is it really faster to muck around with array
lengths than just copying the data over? Array copies are extremely
fast on modern processors.

Programmer time and processor time being what it is, I still think my
answer is the correct one. No, it's not what he says he wants, but it
is what he needs.

--
Jonathan Gardner
jgardner@jonathangardner.net

Carl Banks

2/21/2010 6:39:00 AM

0

On Feb 20, 4:55 pm, marwie <mar...@gmx.de> wrote:
> Hello,
>
> I recently read about augmented assignments and that (with l1, l2
> being lists)
>
>     l1.extend(l2)
>
> is more efficient than
>
>     l1 = l1 + l2
>
> because unnecessary copy operations can be avoided. Now my question is
> if there's a similar thing for breaking a list into two parts. Let's
> say I want to remove from l1 everything from and including position 10
> and store it in l2. Then I can write
>
>     l2 = l1[10:]
>     del l1[10:]
>
> But since I'm assigning a slice the elements will be copied.

That's about the best you can do with Python lists.


> Basically, I'm looking for something like l1.pop(10,len(l1)) which
> returns and removes a whole chunk of data. Is there such a thing (and
> if not, why not?)

Numpy arrays can share underlying data like that when you take
slices. For instance, this probably works the way you want:

a = numpy.array([1,2,3,4,5,6])
b = a[:3]
c = a[3:]

None of the actual data was copied here.



Carl Banks

Steven D'Aprano

2/21/2010 6:49:00 AM

0

On Sat, 20 Feb 2010 21:21:47 -0800, Jonathan Gardner wrote:

> On Sat, Feb 20, 2010 at 5:41 PM, Steven D'Aprano
> <steve@remove-this-cybersource.com.au> wrote:
>>
>> What the OP wants is:
>>
>> (1) assign the name l2 to l1[:10] without copying (2) resize l1 in
>> place to the first 10 items without affecting l2.
>>
>>
> For ten items, though, is it really faster to muck around with array
> lengths than just copying the data over? Array copies are extremely fast
> on modern processors.

My mistake, he actually wants l1[10:] copied. So you might be copying a
huge amount of data, not just ten items.

Or if you prefer, replace 10 by 100000000.

But in either case, I agree that *in practice* it's rarely an actual
problem.


> Programmer time and processor time being what it is, I still think my
> answer is the correct one. No, it's not what he says he wants, but it is
> what he needs.

You missed the point that your suggestion gives radically different
behaviour to what the OP indicated he is using. He mutates the list in
place, you rebind the list's name. That has *very* different semantics.


>>> a = b = [1, 2, 3, 4]
>>> del a[2:] # mutate in place
>>> b
[1, 2]
>>>
>>> a = b = [1, 2, 3, 4]
>>> a = a[2:] # rebind the name
>>> b
[1, 2, 3, 4]

The OP may not care about the difference, but if he does require the
first behaviour, then your solution doesn't help.


--
Steven