[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

bags? 2.5.x?

Dan Stromberg

1/14/2008 5:42:00 PM


Is there a particular reason why bags didn't go into 2.5.x or 3000?

I keep wanting something like them - especially bags with something akin
to set union, intersection and difference.

15 Answers

Wildemar Wildenburger

1/14/2008 7:41:00 PM

0

Dan Stromberg wrote:
> Is there a particular reason why bags didn't go into 2.5.x or 3000?
>
> I keep wanting something like them - especially bags with something akin
> to set union, intersection and difference.
>
How about this recepie
<URL:http://www.ubookcase.com/book/Oreilly/Python.Cookbook.2nd.edition/0596007973/pythoncook2-chp-18-sect-8...

/W

Dan Stromberg

1/17/2008 8:26:00 PM

0

On Mon, 14 Jan 2008 20:41:27 +0100, Wildemar Wildenburger wrote:

> Dan Stromberg wrote:
>> Is there a particular reason why bags didn't go into 2.5.x or 3000?
>>
>> I keep wanting something like them - especially bags with something
>> akin to set union, intersection and difference.
>>
> How about this recepie
> <URL:http://www.ubookcase.com/boo...
Python.Cookbook.2nd.edition/0596007973/pythoncook2-chp-18-sect-8.html>?
>
> /W

I'd found this with google, but thank you for making sure I was aware of
it.

The author of the bag class said that he was planning to submit bags for
inclusion in 2.5 - is there a particular reason why they didn't go in?

I keep finding a need for bags. In the past, I've done this sort of
thing with dictionaries, but it's much nicer to have a bag class, and of
course it's better to have it in the standard library than to slurp it
into this, that and the other project.

Wildemar Wildenburger

1/17/2008 9:28:00 PM

0

Dan Stromberg wrote:
> The author of the bag class said that he was planning to submit bags for
> inclusion in 2.5 - is there a particular reason why they didn't go in?
>
I wouldn't know. Not enough convincing use cases, I guess. Fools ;)


> I keep finding a need for bags. In the past, I've done this sort of
> thing with dictionaries, but it's much nicer to have a bag class, and of
> course it's better to have it in the standard library than to slurp it
> into this, that and the other project.
>
Then again, it is only one class. Also, if I may be so bold, why
wouldn't a simple list fit your needs (other than performance, of course)?


/W

Raymond Hettinger

1/18/2008 2:19:00 AM

0

> >> I keep wanting something like them - especially bags with something
> >> akin to set union, intersection and difference.
>
> > How about this recepie
> > <URL:http://www.ubookcase.com/boo...
>
> The author of the bag class said that he was planning to submit bags for
> inclusion in 2.5 - is there a particular reason why they didn't go in?

Three reasons:

1. b=collections.defaultdict(int) went a long way
towards meeting the need to for a fast counter.

2. It's still not clear what the best API would be.
What should list(b) return for b.dict = {'a':3, 'b':0, 'c':-3}?
Perhaps, [('a', 3), ('b', 0), ('c', -3)]
or ['a', 'a', 'a']
or ['a']
or ['a', 'b', 'c']
or raise an Error for the negative entry.

3. I'm still working on it and am not done yet.


Raymond

Dan Stromberg

1/21/2008 11:14:00 PM

0

On Thu, 17 Jan 2008 18:18:53 -0800, Raymond Hettinger wrote:

>> >> I keep wanting something like them - especially bags with something
>> >> akin to set union, intersection and difference.
>>
>> > How about this recepie
>> > <URL:http://www.ubookcase.com/boo...
>>
>> The author of the bag class said that he was planning to submit bags
>> for inclusion in 2.5 - is there a particular reason why they didn't go
>> in?
>
> Three reasons:
>
> 1. b=collections.defaultdict(int) went a long way towards meeting the
> need to for a fast counter.
>
> 2. It's still not clear what the best API would be. What should list(b)
> return for b.dict = {'a':3, 'b':0, 'c':-3}? Perhaps, [('a', 3), ('b',
> 0), ('c', -3)] or ['a', 'a', 'a']
> or ['a']
> or ['a', 'b', 'c']
> or raise an Error for the negative entry.

I'd suggest that .keys() return the unique list, and that list() return
the list of tuples. Then people can use list comprehensions or map() to
get what they really need.

It might not be a bad thing to have an optional parameter on __init__
that would allow the user to specify if they need negative counts or not;
so far, I've not needed them though.

> 3. I'm still working on it and am not done yet.
>

Decent reasons. :)

Thanks!

Here's a diff to bag.py that helped me. I'd like to think these meanings
are common, but not necessarily!

$ diff -b /tmp/bag.py.original /usr/local/lib/bag.py
18a19,58
> def _difference(lst):
> left = lst[0]
> right = lst[1]
> return max(left - right, 0)
> _difference = staticmethod(_difference)
>
> def _relationship(self, other, operator):
> if isinstance(other, bag):
> self_keys = set(self._data.keys())
> other_keys = set(other._data.keys())
> union_keys = self_keys | other_keys
> #print 'union_keys is',union_keys
> result = bag()
> for element in list(union_keys):
> temp = operator([ self[element], other
[element] ])
> #print 'operator is', operator
> #print 'temp is', temp
> result[element] += temp
> return result
> else:
> raise NotImplemented
>
> def union(self, other):
> return self._relationship(other, sum)
>
> __or__ = union
>
> def intersection(self, other):
> return self._relationship(other, min)
>
> __and__ = intersection
>
> def maxunion(self, other):
> return self._relationship(other, max)
>
> def difference(self, other):
> return self._relationship(other, self._difference)
>
> __sub__ = difference
>

MRAB

1/22/2008 8:45:00 PM

0

On Jan 21, 11:13 pm, Dan Stromberg <dstrombergli...@gmail.com> wrote:
> On Thu, 17 Jan 2008 18:18:53 -0800, Raymond Hettinger wrote:
> >> >> I keep wanting something like them - especially bags with something
> >> >> akin to set union, intersection and difference.
>
> >> > How about this recepie
> >> > <URL:http://www.ubookcase.com/boo...
>
> >> The author of the bag class said that he was planning to submit bags
> >> for inclusion in 2.5 - is there a particular reason why they didn't go
> >> in?
>
> > Three reasons:
>
> > 1. b=collections.defaultdict(int) went a long way towards meeting the
> > need to for a fast counter.
>
> > 2. It's still not clear what the best API would be. What should list(b)
> > return for b.dict = {'a':3, 'b':0, 'c':-3}? Perhaps, [('a', 3), ('b',
> > 0), ('c', -3)] or ['a', 'a', 'a']
> > or ['a']
> > or ['a', 'b', 'c']
> > or raise an Error for the negative entry.
>
> I'd suggest that .keys() return the unique list, and that list() return
> the list of tuples. Then people can use list comprehensions or map() to
> get what they really need.

I think that a bag is a cross between a dict (but the values are
always positive integers) and a set (but duplicates are permitted).

I agree that .keys() should the unique list, but that .items() should
return the tuples and list() should return the list of keys including
duplicates. bag() should accept an iterable and count the duplicates.

For example:

>>> sentence = "the cat sat on the mat"
>>> my_words = sentence.split()
>>> print my_words
['the', 'cat', 'sat', 'on', 'the', 'mat']
>>> my_bag = bag(my_words)
>>> print my_bag
bag({'on': 1, 'the': 2, 'sat': 1, 'mat': 1, 'cat': 1})
my_list = list(my_bag)
['on', 'the', 'the', 'sat', 'mat', 'cat']

It should be easy to convert a bag to a dict and also a dict to a bag,
raising ValueError if it sees a value that's not a non-negative
integer (a value of zero just means "there isn't one of these in the
bag"!).

>
> It might not be a bad thing to have an optional parameter on __init__
> that would allow the user to specify if they need negative counts or not;
> so far, I've not needed them though.
>
> > 3. I'm still working on it and am not done yet.
>
> Decent reasons. :)
>
> Thanks!
>
> Here's a diff to bag.py that helped me. I'd like to think these meanings
> are common, but not necessarily!
>
> $ diff -b /tmp/bag.py.original /usr/local/lib/bag.py
> 18a19,58
>
> > def _difference(lst):
> > left = lst[0]
> > right = lst[1]
> > return max(left - right, 0)
> > _difference = staticmethod(_difference)
>
> > def _relationship(self, other, operator):
> > if isinstance(other, bag):
> > self_keys = set(self._data.keys())
> > other_keys = set(other._data.keys())
> > union_keys = self_keys | other_keys
> > #print 'union_keys is',union_keys
> > result = bag()
> > for element in list(union_keys):
> > temp = operator([ self[element], other
> [element] ])
> > #print 'operator is', operator
> > #print 'temp is', temp
> > result[element] += temp
> > return result
> > else:
> > raise NotImplemented
>
> > def union(self, other):
> > return self._relationship(other, sum)
>
> > __or__ = union
>
> > def intersection(self, other):
> > return self._relationship(other, min)
>
> > __and__ = intersection
>
> > def maxunion(self, other):
> > return self._relationship(other, max)
>
> > def difference(self, other):
> > return self._relationship(other, self._difference)
>
> > __sub__ = difference

Christian Heimes

1/23/2008 7:51:00 AM

0

Dan Stromberg wrote:
> Is there a particular reason why bags didn't go into 2.5.x or 3000?
>
> I keep wanting something like them - especially bags with something akin
> to set union, intersection and difference.

Ask yourself the following questions:

* Is the feature useful for the broad mass?
* Has the feature been implemented and contributed for Python?
* Is the code well written, tested and documented?
* Is the code mature and used by lots of people?

Can you answer every question with yes?

Christian

Dan Stromberg

2/1/2008 10:05:00 PM

0

On Wed, 23 Jan 2008 08:51:13 +0100, Christian Heimes wrote:

> Dan Stromberg wrote:
>> Is there a particular reason why bags didn't go into 2.5.x or 3000?
>>
>> I keep wanting something like them - especially bags with something
>> akin to set union, intersection and difference.
>
> Ask yourself the following questions:
>
> * Is the feature useful for the broad mass?

Yes, probably, at least if this kind of feature's inclusion in other
languages and my two recent needs for it are any indication. In other
languages, they are sometimes called bags or multisets.

* Has the feature been
> implemented and contributed for Python?

Yes.

* Is the code well written,
> tested and documented?

Yes, I believe so, though the patch I sent has only been used in
production a little.

* Is the code mature and used by lots of people?

This is hard to get a handle on. It's been on the web for a while, so
it's hard to say how many people have downloaded it and used it.

> Can you answer every question with yes?

3 yesses and a "hard to say for sure". But maybe we could check the web
server log to get a download count for the maybe.


> Christian

Paul Rubin

2/1/2008 10:30:00 PM

0

Dan Stromberg <dstromberglists@gmail.com> writes:
> > * Is the feature useful for the broad mass?
>
> Yes, probably, at least if this kind of feature's inclusion in other
> languages and my two recent needs for it are any indication. In other
> languages, they are sometimes called bags or multisets.

I use defaultdict(int) all the time, but it's been fairly rare
to want to do set operations (union, intersection, difference) on
these. I'm not even sure what the intersection should mean.

Arnaud Delobelle

2/2/2008 8:40:00 AM

0

On Feb 1, 10:29 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Dan Stromberg <dstrombergli...@gmail.com> writes:
> > > * Is the feature useful for the broad mass?
>
> > Yes, probably, at least if this kind of feature's inclusion in other
> > languages and my two recent needs for it are any indication.  In other
> > languages, they are sometimes called bags or multisets.
>
> I use defaultdict(int) all the time, but it's been fairly rare
> to want to do set operations (union, intersection, difference) on
> these.  I'm not even sure what the intersection should mean.

Let me write the multiset with 3 x's and 2y y' as {2x, 3y} (this is
not a suggested Python syntax!).

Note: in the following my multisets can have negative multiplicities,
whereas the usual definition only accepts positive multiplicities.

Let H be the collection of all python hashable objects.
Let Z be the set of integers.

So a multiset is a mapping from H to Z which maps cofinitely many
objects
to 0 (a sort of defaultdict(int))

=== For mathematically minded people ===

Well, if one thinks about the mathematical meaning of multisets, then
the collection of all multisets is the free Z-module over H.
Therefore the meaningful operations for multisets are addition,
subtraction and multiplication by a scalar.

=== For not so mathematically minded people ===

One way of thinking about a multiset can be as a vector whose
dimensions are explicitly named, instead of being implicit. For
example in 3D the vector [2, 3, 4] could instead be written as {2x,
3y, 4z}. Just like vectors, multisets can then be added, subtracted
and multiplied by as scalar:


{2x, 3y} + {4x, 7z} = {6x, 3y, 7z}
{4x, 2y, 1z} - {1x, 2y, 3z} = {3x, -2z}
{3x, 2y}*5 = {15x, 10y}

Now if one wants to extend union and intersection to multisets:

* For sets {x, y} union {y, z} = {x, y, z}. The natural way of
extending this to multisets is having the union operator take the
max of the multiplicities of each element, i.e.

{2x, 7y, 1z} union {3y, 1z, 5t} = {2x, 7y, 1z, 5t}
{3x, -5y} union {2z} = {3x, 2z} (!)

* Similarly, for intersection one would take the min of
multiplicities, i.e.

{2x, 7y, 1z} intersection {3y, 1z, 5t} = {3y, 1z}
{3x, -5y} intersection {2z} = {-5y} (!)

* difference and symmetric_difference can be defined in terms of +, -,
union, intersection:

A difference B = A - (A intersection B)
A symmetric_difference B = (A union B) - (A intersection B)

Note that union, intersection, difference and symmetric_difference
defined this way are fully compatible with the corresponding set
operations in the following sense:

If A and B are sets, then

- multiset(A).union(multiset(B)) == multiset(A.union(B))

- multiset(A).intersection(multiset(B)) == multiset(A.intersection(B))

(same for difference, symmetric_difference)

I think this is how I would like my multisets :)

--
Arnaud