Asp Forum - flattening a dict

Benjamin

2/17/2008 3:56:00 AM

How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23
}

22 Answers

Jeff Schwab

2/17/2008 4:23:00 AM

Benjamin wrote:
> How would I go about "flattening" a dict with many nested dicts
> within? The dicts might look like this:
> {"mays" : {"eggs" : "spam"},
> "jam" : {"soda" : {"love" : "dump"}},
> "lamba" : 23
> }
> I'd like it to put "/" inbetween the dicts to make it a one
> dimensional dict and look like this:
> {"mays/eggs" : "spam",
> "jam/soda/love" : "dump",
> "lamba" : 23
> }

What have you tried so far?

Arnaud Delobelle

2/17/2008 8:05:00 AM

On Feb 17, 3:56 am, Benjamin <musiccomposit...@gmail.com> wrote:
> How would I go about "flattening" a dict with many nested dicts
> within? The dicts might look like this:
> {"mays" : {"eggs" : "spam"},
> "jam" : {"soda" : {"love" : "dump"}},
> "lamba" : 23}
>
> I'd like it to put "/" inbetween the dicts to make it a one
> dimensional dict and look like this:
> {"mays/eggs" : "spam",
> "jam/soda/love" : "dump",
> "lamba" : 23
>
> }

In Python you can do anything, even flatten a dictionary:

from itertools import chain

def flattendict(d, pfx='', sep='/'):
if isinstance(d, dict):
if pfx: pfx += sep
return chain(*(flattendict(v, pfx+k, sep) for k, v in
d.iteritems()))
else:
return (pfx, d),

test = {"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}

>>> print dict(flattendict(test))
{'lamba': 23, 'mays/eggs': 'spam', 'jam/soda/love': 'dump'}

You an even have other separators ;)

>>> print dict(flattendict(test, sep='.'))
{'lamba': 23, 'jam.soda.love': 'dump', 'mays.eggs': 'spam'}

--
Arnaud

Boris Borcic

2/17/2008 8:40:00 AM

Arnaud Delobelle wrote:
>
> In Python you can do anything, even

....pass the Turing test with a one-liner. Back after 9/11, when US patriotism
was the rage, Python knew how to answer correctly the query

filter(lambda W : W not in 'ILLITERATE','BULLSHIT')

And Python 3.0 slated for next August offers it a chance to change its mind
before the next elections...

Cheers, BB

Terry Jones

2/17/2008 12:19:00 PM

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

if __name__ == '__main__':
test = {
"" : {},
4 : 5,
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}

d = dictflatten(test)
print strdictflatten(test)

>>>> {'': None, 'lamba': 23, 'mays/eggs': 'spam', '4': 5, 'jam/soda/love': 'dump'}

Arnaud Delobelle

2/17/2008 12:52:00 PM

On Feb 17, 12:18 pm, Terry Jones <te...@jon.es> wrote:
> Hi Arnaud & Benjamin
>
> Here's a version that's a bit more general. It handles keys whose values
> are empty dicts (assigning None to the value in the result), and also dict
> keys that are not strings (see the test data below). It's also less
> recursive as it only calls itself on values that are dicts.
>
> But.... it returns a dict whose keys are tuples. You then need to decide
> what to do with this. Hence the helper function strdictflatten for the case
> when all dict keys can be converted to str.
>
> As I told Arnaud in email, I greatly prefer his version for its elegance.
>
> Terry
>
> def dictflatten(d, prefix=None):
> result = {}
> if prefix is None: prefix = tuple()
> for k, v in d.iteritems():
> key = prefix + (k,)
> if isinstance(v, dict):
> if v:
> result.update(dictflatten(v, key))
> else:
> result[key] = None
> else:
> result[key] = v
> return result

It's nice to do it this way I think. Here I post a generator-powered
version of the same idea.

from itertools import chain

def flattendict(d):
def gen(d, pfx=()):
return chain(*(gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)
for k, v in d.iteritems()))
return dict(gen(d))

BTW, I keep using the idiom itertools.chain(*iterable). I guess that
during function calls *iterable gets expanded to a tuple. Wouldn't it
be nice to have an equivalent one-argument function that takes an
iterable of iterables and return the 'flattened' iterable?

--
Arnaud

George Sakkis

2/17/2008 3:55:00 PM

On Feb 17, 7:51 am, Arnaud Delobelle <arno...@googlemail.com> wrote:

> BTW, I keep using the idiom itertools.chain(*iterable). I guess that
> during function calls *iterable gets expanded to a tuple. Wouldn't it
> be nice to have an equivalent one-argument function that takes an
> iterable of iterables and return the 'flattened' iterable?

Indeed; I don't have any exact numbers but I roughly use this idiom as
often or more as the case where chain() takes a known fixed number of
arguments. The equivalent function you describe is trivial:

def chain2(iter_of_iters):
for iterable in iter_of_iters:
for i in iterable:
yield i

but I usually don't bother, although if it was available in itertools
I'd use it instead.

Apart from introducing a new function for something quite similar,
another idea would be to modify chain(*iterables) semantics if it is
passed a single argument. Currently chain() is useful for 2 or more
arguments; chain(x) is equivalent to iter(x), there's no reason to use
it ever. On the downside, this might break backwards compatibility for
cases like chain(*some_iterable) where some_iterable has length of 1
but I'd guess this is quite rare.

George

Boris Borcic

2/17/2008 4:03:00 PM

George Sakkis wrote:
> On Feb 17, 7:51 am, Arnaud Delobelle <arno...@googlemail.com> wrote:
>
>> BTW, I keep using the idiom itertools.chain(*iterable). I guess that
>> during function calls *iterable gets expanded to a tuple. Wouldn't it
>> be nice to have an equivalent one-argument function that takes an
>> iterable of iterables and return the 'flattened' iterable?
>
> Indeed; I don't have any exact numbers but I roughly use this idiom as
> often or more as the case where chain() takes a known fixed number of
> arguments. The equivalent function you describe is trivial:
>
> def chain2(iter_of_iters):
> for iterable in iter_of_iters:
> for i in iterable:
> yield i

or fwiw

chainstar = lambda iters : (x for it in iters for x in it)

- a form that better suggests how to inline it in the calling expression, if
applicable.

Cheers, BB

Terry Jones

2/17/2008 4:41:00 PM

>>>>> "Arnaud" == Arnaud Delobelle <arnodel@googlemail.com> writes:

Arnaud> BTW, I keep using the idiom itertools.chain(*iterable). I guess
Arnaud> that during function calls *iterable gets expanded to a tuple.
Arnaud> Wouldn't it be nice to have an equivalent one-argument function
Arnaud> that takes an iterable of iterables and return the 'flattened'
Arnaud> iterable?

This reminds me of a function I wrote a while back that iterates over all
its arguments, even calling passed functions and iterating over their
results. I knew *even less* about Python then than I do now; please set
flamethrowers to "Constructive Criticism".

http://www.fluidinfo.com/terry/2007/05/07/ite...

Terry

Arnaud Delobelle

2/17/2008 5:15:00 PM

On Feb 17, 4:03 pm, Boris Borcic <bbor...@gmail.com> wrote:
> George Sakkis wrote:
> > On Feb 17, 7:51 am, Arnaud Delobelle <arno...@googlemail.com> wrote:
>
> >> BTW, I keep using the idiom itertools.chain(*iterable). I guess that
> >> during function calls *iterable gets expanded to a tuple. Wouldn't it
> >> be nice to have an equivalent one-argument function that takes an
> >> iterable of iterables and return the 'flattened' iterable?
>
> > Indeed; I don't have any exact numbers but I roughly use this idiom as
> > often or more as the case where chain() takes a known fixed number of
> > arguments. The equivalent function you describe is trivial:
>
> > def chain2(iter_of_iters):
> > for iterable in iter_of_iters:
> > for i in iterable:
> > yield i
>
> or fwiw
>
> chainstar = lambda iters : (x for it in iters for x in it)
>
> - a form that better suggests how to inline it in the calling expression, if
> applicable.

Indeed:

def flattendict(d):
def gen(d, pfx=()):
return (x for k, v in d.iteritems()
for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)))
return dict(gen(d))

I don't know, I find the chain(*...) version more readable, although
this one is probably better. Please, Mr itertools, can we have
chainstar?

--
Arnaud

Boris Borcic

2/18/2008 1:50:00 PM

Arnaud Delobelle wrote:
> On Feb 17, 4:03 pm, Boris Borcic <bbor...@gmail.com> wrote:
>> George Sakkis wrote:
>>> On Feb 17, 7:51 am, Arnaud Delobelle <arno...@googlemail.com> wrote:
>>>> BTW, I keep using the idiom itertools.chain(*iterable). I guess that
>>>> during function calls *iterable gets expanded to a tuple. Wouldn't it
>>>> be nice to have an equivalent one-argument function that takes an
>>>> iterable of iterables and return the 'flattened' iterable?
>>> Indeed; I don't have any exact numbers but I roughly use this idiom as
>>> often or more as the case where chain() takes a known fixed number of
>>> arguments. The equivalent function you describe is trivial:
>>> def chain2(iter_of_iters):
>>> for iterable in iter_of_iters:
>>> for i in iterable:
>>> yield i
>> or fwiw
>>
>> chainstar = lambda iters : (x for it in iters for x in it)
>>
>> - a form that better suggests how to inline it in the calling expression, if
>> applicable.
>
> Indeed:
>
> def flattendict(d):
> def gen(d, pfx=()):
> return (x for k, v in d.iteritems()
> for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
> else ((pfx+(k,), v),)))
> return dict(gen(d))
>
> I don't know, I find the chain(*...) version more readable,

In this case I could concede that it is, although I find both forms overly
convoluted for easy reading.

> although
> this one is probably better.

It is more elementary in the mathematician's sense, and therefore preferable all
other things being equal, imo. I've tried to split 'gen' but I can't say the
result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

For fun I've also written a (vaguely prologish) non-recursive generator-based
version that exploits undocumented (but logical) behavior of list.extend

class iterstack(list) :
__iter__ = lambda self : self
def next(self) :
while self :
try :
return self[-1].next()
except StopIteration :
self.pop()
raise StopIteration

def flattendict(d,stk=[]) :
res={}
its=iterstack()
its.extend(stk[-1].iteritems()
for stk[len(its)-1:] in chain([[d]],its)
if isinstance(stk[-1],dict)
or res.update([(stk.pop(),tuple(stk))[::-1]]))
return res

(testing was minimal)

Challenge for who really has time to waste : replace the iterstack list subclass
definition with a simple list + a generator expression inside flattendict.

Cheers, BB

comp.lang.python

flattening a dict

Benjamin

Jeff Schwab

Arnaud Delobelle

Boris Borcic

Terry Jones

Arnaud Delobelle

George Sakkis

Boris Borcic

Terry Jones

Arnaud Delobelle

Boris Borcic

x Login to ForumsZone