vedranp
3/6/2008 4:29:00 PM
Hi,
I have a case where I should aggregate data from the CSV file, which
contains data in this way:
DATE TIME COUNTRY ZIP CITY VALUE1 VALUE2 VALUE3
21.2.2008 00:00 A 1000 CITY1 1 2 3
21.2.2008 00:00 A 1000 CITY2 4 5 6
21.2.2008 00:00 A 1000 CITY3 7 8 9
21.2.2008 00:00 A 1000 CITY4 1 2 3
21.2.2008 00:15 A 1000 CITY1 4 5 6
21.2.2008 00:15 A 1000 CITY2 7 8 9
21.2.2008 00:15 A 1000 CITY3 1 2 3
21.2.2008 00:15 A 1000 CITY4 4 5 6
21.2.2008 00:00 A 2000 CITY10 7 8 9
21.2.2008 00:00 A 2000 CITY20 1 2 3
21.2.2008 00:00 A 2000 CITY30 4 5 6
21.2.2008 00:00 A 2000 CITY40 1 2 3
21.2.2008 00:15 A 2000 CITY10 7 8 9
21.2.2008 00:15 A 2000 CITY20 1 2 3
21.2.2008 00:15 A 2000 CITY30 4 5 6
21.2.2008 00:15 A 2000 CITY40 1 2 3
I need to aggregate data from file1, so the result would be a CSV file
(file2) in this format:
DATE COUNTRY ZIP CITY SumOfVALUE1 SumOfVALUE2 SumOfVALUE3 formula1
21.2.2008 A 1000 CITY1 5 7 9 12
21.2.2008 A 1000 CITY2 11 13 15 24
21.2.2008 A 1000 CITY3 8 10 12 18
21.2.2008 A 1000 CITY4 5 7 9 12
21.2.2008 A 2000 CITY10 14 16 18 30
21.2.2008 A 2000 CITY20 2 4 6 6
21.2.2008 A 2000 CITY30 8 10 12 18
21.2.2008 A 2000 CITY40 2 4 6 6
So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some
calculation) the values and do some calculation from summed fields
(e.g.: formula1 = SumOfVALUE1+SumOfVALUE2). I am able to do this by
first loading file1 in SQL, perform a query there, which returns the
file2 results and then load it back in the SQL in the different table.
I would like to avoid the step of taking data out from database in
order to process it. I would like to process the file1 in Python and
load the result (file2) in SQL.
From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?
Regards,
Vedran.