[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

related lists mean value

dimitri pater

3/8/2010 10:34:00 PM

Hi,

I have two related lists:
x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

what I need is a list representing the mean value of 'a', 'b' and 'c'
while maintaining the number of items (len):
w = [1.5, 1.5, 8, 4, 4, 4]

I have looked at iter(tools) and next(), but that did not help me. I'm
a bit stuck here, so your help is appreciated!

thanks!
Dimitri
12 Answers

John Posner

3/9/2010 2:40:00 AM

0

On 3/8/2010 5:34 PM, dimitri pater - serpia wrote:
> Hi,
>
> I have two related lists:
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> what I need is a list representing the mean value of 'a', 'b' and 'c'
> while maintaining the number of items (len):
> w = [1.5, 1.5, 8, 4, 4, 4]
>
> I have looked at iter(tools) and next(), but that did not help me. I'm
> a bit stuck here, so your help is appreciated!

Nobody expects object-orientation (or the Spanish Inquisition):

#-------------------------
from collections import defaultdict

class Tally:
def __init__(self, id=None):
self.id = id
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y[i]]
obj.id = y[i]
obj.total += x[i]
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#-------------------------

-John

John Posner

3/9/2010 2:44:00 AM

0

On 3/8/2010 9:39 PM, John Posner wrote:

<snip>

> # gather data
> tally_dict = defaultdict(Tally)
> for i in range(len(x)):
> obj = tally_dict[y[i]]
> obj.id = y[i] <--- statement redundant, remove it
> obj.total += x[i]
> obj.count += 1

-John


John Posner

3/9/2010 2:54:00 AM

0

On 3/8/2010 9:43 PM, John Posner wrote:
> On 3/8/2010 9:39 PM, John Posner wrote:
>
> <snip>

>> obj.id = y[i] <--- statement redundant, remove it

Sorry for the thrashing! It's more correct to say that the Tally class
doesn't require an "id" attribute at all. So the code becomes:

#---------
from collections import defaultdict

class Tally:
def __init__(self):
self.total = 0
self.count = 0

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c']

# gather data
tally_dict = defaultdict(Tally)
for i in range(len(x)):
obj = tally_dict[y[i]]
obj.total += x[i]
obj.count += 1

# process data
result_list = []
for key in sorted(tally_dict):
obj = tally_dict[key]
mean = 1.0 * obj.total / obj.count
result_list.extend([mean] * obj.count)
print result_list
#---------

-John

Michael Rudolf

3/9/2010 10:30:00 AM

0

Am 08.03.2010 23:34, schrieb dimitri pater - serpia:
> Hi,
>
> I have two related lists:
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> what I need is a list representing the mean value of 'a', 'b' and 'c'
> while maintaining the number of items (len):
> w = [1.5, 1.5, 8, 4, 4, 4]

This kinda looks like you used the wrong data structure.
Maybe you should have used a dict, like:
{'a': [1, 2], 'c': [5, 0, 7], 'b': [8]} ?

> I have looked at iter(tools) and next(), but that did not help me. I'm
> a bit stuck here, so your help is appreciated!

As said, I'd have used a dict in the first place, so lets transform this
straight forward into one:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

# initialize dict
d={}
for idx in set(y):
d[idx]=[]

#collect values
for i, idx in enumerate(y):
d[idx].append(x[i])

print("d is now a dict of lists: %s" % d)

#calculate average
for key, values in d.items():
d[key]=sum(values)/len(values)

print("d is now a dict of averages: %s" % d)

# build the final list
w = [ d[key] for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))


Output is:
d is now a dict of lists: {'a': [1, 2], 'c': [5, 0, 7], 'b': [8]}
d is now a dict of averages: {'a': 1.5, 'c': 4.0, 'b': 8.0}
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Could have used a defaultdict to avoid dict initialisation, though.
Or write a custom class:

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

class A:
def __init__(self):
self.store={}
def add(self, key, number):
if key in self.store:
self.store[key].append(number)
else:
self.store[key] = [number]
a=A()

# collect data
for idx, val in zip(y,x):
a.add(idx, val)

# build the final list:
w = [ sum(a.store[key])/len(a.store[key]) for key in y ]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Produces same output, of course.

Note that those solutions are both not very efficient, but who cares ;)

> thanks!

No Problem,

Michael

Michael Rudolf

3/9/2010 11:12:00 AM

0

OK, I golfed it :D
Go ahead and kill me ;)

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(a,b,v={}):
try: v[a].append(b)
except: v[a]=[b]
def g(a): return sum(v[a])/len(v[a])
return g
w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]

print("w is now the list of averages, corresponding with y:\n \
\n x: %s \n y: %s \n w: %s \n" % (x, y, w))

Output:
w is now the list of averages, corresponding with y:

x: [1, 2, 8, 5, 0, 7]
y: ['a', 'a', 'b', 'c', 'c', 'c']
w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Regards,
Michael

Peter Otten

3/9/2010 12:02:00 PM

0

Michael Rudolf wrote:

> OK, I golfed it :D
> Go ahead and kill me ;)
>
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> def f(a,b,v={}):
> try: v[a].append(b)
> except: v[a]=[b]
> def g(a): return sum(v[a])/len(v[a])
> return g
> w = [g(i) for g,i in [(f(i,v),i) for i,v in zip(y,x)]]
>
> print("w is now the list of averages, corresponding with y:\n \
> \n x: %s \n y: %s \n w: %s \n" % (x, y, w))
>
> Output:
> w is now the list of averages, corresponding with y:
>
> x: [1, 2, 8, 5, 0, 7]
> y: ['a', 'a', 'b', 'c', 'c', 'c']
> w: [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
[1.5, 1.5, 8.0, 4.0, 4.0, 4.0]

Peter

Michael Rudolf

3/9/2010 3:00:00 PM

0

Am 09.03.2010 13:02, schrieb Peter Otten:
>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
> Peter

.... pwned.
Should be the fastest and shortest way to do it.

I tried to do something like this, but my brain hurt while trying to
visualize list comprehension evaluation orders ;)

Regards,
Michael

Steve Howell

3/9/2010 3:21:00 PM

0

On Mar 8, 6:39 pm, John Posner <jjpos...@optimum.net> wrote:
> On 3/8/2010 5:34 PM, dimitri pater - serpia wrote:
>
> > Hi,
>
> > I have two related lists:
> > x = [1 ,2, 8, 5, 0, 7]
> > y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> > what I need is a list representing the mean value of 'a', 'b' and 'c'
> > while maintaining the number of items (len):
> > w = [1.5, 1.5, 8, 4, 4, 4]
>
> > I have looked at iter(tools) and next(), but that did not help me. I'm
> > a bit stuck here, so your help is appreciated!
>
> Nobody expects object-orientation (or the Spanish Inquisition):
>

Heh. Yep, I avoided OO for this. Seems like a functional problem.
My solution is functional on the outside, imperative on the inside.
You could add recursion here, but I don't think it would be as
straightforward.

def num_dups_at_head(lst):
assert len(lst) > 0
val = lst[0]
i = 1
while i < len(lst) and lst[i] == val:
i += 1
return i

def smooth(x, y):
result = []
while x:
cnt = num_dups_at_head(y)
avg = sum(x[:cnt]) * 1.0 / cnt
result += [avg] * cnt
x = x[cnt:]
y = y[cnt:]
return result


> #-------------------------
> from collections import defaultdict
>
> class Tally:
>      def __init__(self, id=None):
>          self.id = id
>          self.total = 0
>          self.count = 0
>
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c']
>
> # gather data
> tally_dict = defaultdict(Tally)
> for i in range(len(x)):
>      obj = tally_dict[y[i]]
>      obj.id = y[i]
>      obj.total += x[i]
>      obj.count += 1
>
> # process data
> result_list = []
> for key in sorted(tally_dict):
>      obj = tally_dict[key]
>      mean = 1.0 * obj.total / obj.count
>      result_list.extend([mean] * obj.count)
> print result_list
> #-------------------------

Peter Otten

3/9/2010 4:10:00 PM

0

Michael Rudolf wrote:

> Am 09.03.2010 13:02, schrieb Peter Otten:
>>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
>> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
>> Peter
>
> ... pwned.
> Should be the fastest and shortest way to do it.

It may be short, but it is not particularly efficient. A dict-based approach
is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
from itertools import izip as zip
except ImportError:
pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
p = defaultdict(int)
q = defaultdict(int)
for a, b in zip(x, y):
p[b] += a
q[b] += 1
return [p[b]/q[b] for b in y]

def g(x=x, y=y):
return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
print(f())
print(g())
assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter

Steve Howell

3/9/2010 4:30:00 PM

0

On Mar 8, 2:34 pm, dimitri pater - serpia <dimitri.pa...@gmail.com>
wrote:
> Hi,
>
> I have two related lists:
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> what I need is a list representing the mean value of 'a', 'b' and 'c'
> while maintaining the number of items (len):
> w = [1.5, 1.5, 8, 4, 4, 4]
>

What results are you expecting if you have multiple runs of 'a' in a
longer list?