Asp Forum - Data aggregation - comp.lang.python

vedranp

3/6/2008 4:29:00 PM

Hi,

I have a case where I should aggregate data from the CSV file, which
contains data in this way:

DATE TIME COUNTRY ZIP CITY VALUE1 VALUE2 VALUE3
21.2.2008 00:00 A 1000 CITY1 1 2 3
21.2.2008 00:00 A 1000 CITY2 4 5 6
21.2.2008 00:00 A 1000 CITY3 7 8 9
21.2.2008 00:00 A 1000 CITY4 1 2 3
21.2.2008 00:15 A 1000 CITY1 4 5 6
21.2.2008 00:15 A 1000 CITY2 7 8 9
21.2.2008 00:15 A 1000 CITY3 1 2 3
21.2.2008 00:15 A 1000 CITY4 4 5 6
21.2.2008 00:00 A 2000 CITY10 7 8 9
21.2.2008 00:00 A 2000 CITY20 1 2 3
21.2.2008 00:00 A 2000 CITY30 4 5 6
21.2.2008 00:00 A 2000 CITY40 1 2 3
21.2.2008 00:15 A 2000 CITY10 7 8 9
21.2.2008 00:15 A 2000 CITY20 1 2 3
21.2.2008 00:15 A 2000 CITY30 4 5 6
21.2.2008 00:15 A 2000 CITY40 1 2 3

I need to aggregate data from file1, so the result would be a CSV file
(file2) in this format:

DATE COUNTRY ZIP CITY SumOfVALUE1 SumOfVALUE2 SumOfVALUE3 formula1
21.2.2008 A 1000 CITY1 5 7 9 12
21.2.2008 A 1000 CITY2 11 13 15 24
21.2.2008 A 1000 CITY3 8 10 12 18
21.2.2008 A 1000 CITY4 5 7 9 12
21.2.2008 A 2000 CITY10 14 16 18 30
21.2.2008 A 2000 CITY20 2 4 6 6
21.2.2008 A 2000 CITY30 8 10 12 18
21.2.2008 A 2000 CITY40 2 4 6 6

So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some
calculation) the values and do some calculation from summed fields
(e.g.: formula1 = SumOfVALUE1+SumOfVALUE2). I am able to do this by
first loading file1 in SQL, perform a query there, which returns the
file2 results and then load it back in the SQL in the different table.

I would like to avoid the step of taking data out from database in
order to process it. I would like to process the file1 in Python and
load the result (file2) in SQL.

From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?

Regards,
Vedran.

3 Answers

jay graves

3/6/2008 5:38:00 PM

On Mar 6, 10:28 am, vedranp <vedran.preg...@gmail.com> wrote:
> So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some

You are soooo close. Look up itertools.groupby
Don't forget to sort your data first.

http://aspn.activestate.com/ASPN/search?query=groupby&x=0&y=0&section=PYTHONCKBK&type=...
http://mail.python.org/pipermail/python-list/2006-June/3...

> From some little experience with Perl, I think this is managable with
> double hash tables (1: basic hash with key/value = CITY/pointer-to-
> other-hash, 2: hash table with values for CITY1), so I assume that
> there would be also a way in Python, maybe with dictionaries? Any
> ideas?

Sometimes it makes sense to do this with dictionaries. For example,
if you need to do counts on various combinations of columns.

count of unique values in column 'A'
count of unique values in column 'C'
count of unique combinations of columns 'A' and 'B'
count of unique combinations of columns 'A' and 'C'
count of unique combinations of columns 'B' and 'C'
in all cases, sum(D) and avg(E)

Since I need 'C' by itself, and 'A' and 'C' together, I can't just
sort and break on 'A','B','C'.

HTH
....
jay graves

John Nagle

3/6/2008 6:45:00 PM

vedranp wrote:

> I would like to avoid the step of taking data out from database in
> order to process it.

You can probably do this entirely within SQL. Most SQL databases,
including MySQL, will let you put the result of a SELECT into a new
table.

John Nagle

petr.jakes.tpc

3/6/2008 8:07:00 PM

On Mar 6, 7:44 pm, John Nagle <na...@animats.com> wrote:
> vedranp wrote:
> > I would like to avoid the step of taking data out from database in
> > order to process it.
>
> You can probably do this entirely within SQL. Most SQL databases,
> including MySQL, will let you put the result of a SELECT into a new
> table.
>
> John Nagle
I agree,

maybe following can help

http://tinyurl....

Petr Jakes

comp.lang.python

Data aggregation

vedranp

jay graves

John Nagle

petr.jakes.tpc

x Login to ForumsZone