Asp Forum - Duplicate keys in dict?

vsoler

3/7/2010 4:23:00 PM

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

.................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.

What is the best approach for this problem? Can anybody help?

6 Answers

News123

3/7/2010 4:46:00 PM

vsoler wrote:
> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

Normall dicts are used if you want to access your data at a later point
in time by the key name.

Do you want to be able to do this?

Then what would you expect to receive for d[('cd','A')] ?

The first value? the second value? both values?

Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?

Not knowing your exact context makes it difficult to suggest solutions?

perhaps you could switch to a list containing a tuple of (rowname,rowdict)

l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]

bye

N

>
> What is the best workaround for this? Should I discard dicts? Should I
> somehow have under 'cd'... a list of values?
>
> One of the difficulties I find here is that I want to be able to
> easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
> However, using lists inside dicts makes it a difficult issue for me.
>
> What is the best approach for this problem? Can anybody help?

Steven D'Aprano

3/7/2010 4:54:00 PM

On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:

> Hello,
>
> My code snippet reads data from excel ranges. First row and first column
> are column headers and row headers respectively. After reding the range
> I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.

> One of the difficulties I find here is that I want to be able to easily
> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> using lists inside dicts makes it a difficult issue for me.

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

--
Steven

vsoler

3/7/2010 5:13:00 PM

On 7 mar, 17:53, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
> > Hello,
>
> > My code snippet reads data from excel ranges. First row and first column
> > are column headers and row headers respectively. After reding the range
> > I build a dict.
>
> > ................'A'..............'B'
> > 'ab'............3................5
> > 'cd'............7................2
> > 'cd'............9................1
> > 'ac'............7................2
>
> > d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> > However, as you can see there are two rows that start with 'cd', and
> > dicts, AFAIK do not accept duplicates.
> > One of the difficulties I find here is that I want to be able to easily
> > sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
> > using lists inside dicts makes it a difficult issue for me.
>
> Given the sample above, what answer do you expect for summing the 'cd'
> row? There are four reasonable answers:
>
> 7 + 2 = 9
> 9 + 1 = 10
> 7 + 2 + 9 + 1 = 19
> Error
>
> You need to decide what you want to do before asking how to do it.
>
> --
> Steven

Steven,

What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

I apologize for not having made it clear.

Dennis Lee Bieber

3/7/2010 7:05:00 PM

On Sun, 7 Mar 2010 08:23:13 -0800 (PST), vsoler
<vicente.soler@gmail.com> declaimed the following in
gmane.comp.python.general:

> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>
> ................'A'..............'B'
> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>
> However, as you can see there are two rows that start with 'cd', and
> dicts, AFAIK do not accept duplicates.
>
First, I would not key using ("ab", "A")... I'd key just off "ab"...
The A and B columns are just that -- column positions -- the data would
look more like

{ "ab" : (3, 5),
"cd" : (7, 2)... }

Now... for the duplicate key processing...

If you initialize each key with a list, you can then append the data
tuple...

data = {}
for key, cA, cB in sourcedata:
if key in data:
data[key].append( (cA, cB) )
else:
data[key] = [ (cA, cB) ]

This way, what you end up with

"cd" : [ (7, 2), (9, 1) ]

> What is the best workaround for this? Should I discard dicts? Should I
> somehow have under 'cd'... a list of values?
>

If you really are using Excel -- it might be better to find out how
to command Excel to compute your results... Basically it sounds like you
are creating a grouped sum on the columns, with groups defined by the
value of the first column.

Would be child's play with a regular SQL database -- something like
(this is off the top of my head so may have some syntax errors)

select key, sum(A), sum(B) from aTable
group by key
order by key

Or just use a list of tuples, sorting, and create a report writer
break handler...

> 'ab'............3................5
> 'cd'............7................2
> 'cd'............9................1
> 'ac'............7................2
>
data = [ ("ab", 3, 5),
("cd", 7, 2),
("cd", 9, 1),
("ac", 7, 2) ]

skey = None
sA = 0
sB = 0

data.sort() #ensures same keys are adjacent

for (k, a, b) in data:
if k != skey: #key changed, report group output
if skey:
print "%s\t%s\t%s" % (skey, sA, sB)
skey = k #initialize next group
sA = a
sB = b
else:
sA += a #sum to current group
sB += b
if skey: #need to output the last key in the data at end
print "%s\t%s\t%s" % (skey, sA, sB)

--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/

Tim Chase

3/7/2010 7:11:00 PM

vsoler wrote:
> On 7 mar, 17:53, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
>>> Hello,
>>> My code snippet reads data from excel ranges. First row and first column
>>> are column headers and row headers respectively. After reding the range
>>> I build a dict.
>>> ................'A'..............'B'
>>> 'ab'............3................5
>>> 'cd'............7................2
>>> 'cd'............9................1
>>> 'ac'............7................2
>>> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>>> However, as you can see there are two rows that start with 'cd', and
>>> dicts, AFAIK do not accept duplicates.
>>> One of the difficulties I find here is that I want to be able to easily
>>> sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
>>> using lists inside dicts makes it a difficult issue for me.
>
> What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
> me 3.

But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like

import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:

d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.

-tkc

Yinon Ehrlich

3/9/2010 12:28:00 PM

On Mar 7, 6:23 pm, vsoler <vicente.so...@gmail.com> wrote:
> Hello,
>
> My code snippet reads data from excel ranges. First row and first
> column are column headers and row headers respectively. After reding
> the range I build a dict.
>
> What is the best approach for this problem? Can anybody help?

Have you tried xlread ? (http://www.python-...)
Best,
-- Yinon

comp.lang.python

Duplicate keys in dict?

vsoler

News123

Steven D'Aprano

vsoler

Dennis Lee Bieber

Tim Chase

Yinon Ehrlich

x Login to ForumsZone