[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

how to build a dict including a large number of data

wanzathe

1/4/2008 1:58:00 PM

hi everyone
i'm a newbie to python :)
i have a binary file named test.dat including 9600000 records.
the record format is int a + int b + int c + int d
i want to build a dict like this: key=int a,int b values=int c,int d
i choose using bsddb and it takes about 140 seconds to build the dict.
what can i do if i want to make my program run faster?
or is there another way i can choose?
Thanks in advance.

My Code:
-----------------------------------------------------------------------------------
my_file = file('test.dat','rb')
content = my_file.read()
record_number = len(content) / 16

db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])

db.close()
my_file.close()
3 Answers

Chris

1/4/2008 2:07:00 PM

0

On Jan 4, 3:57 pm, wanzathe <wanza...@gmail.com> wrote:
> hi everyone
> i'm a newbie to python :)
> i have a binary file named test.dat including 9600000 records.
> the record format is int a + int b + int c + int d
> i want to build a dict like this: key=int a,int b values=int c,int d
> i choose using bsddb and it takes about 140 seconds to build the dict.
> what can i do if i want to make my program run faster?
> or is there another way i can choose?
> Thanks in advance.
>
> My Code:
> -----------------------------------------------------------------------------------
> my_file = file('test.dat','rb')
> content = my_file.read()
> record_number = len(content) / 16
>
> db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
> for i in range(0,record_number):
> a = struct.unpack("IIII",content[i*16:i*16+16])
> db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])
>
> db.close()
> my_file.close()

my_file = file('test.dat','rb')
db = bsddb.btopen('test.dat.db','n',cachesize=500000000)
content = myfile.read(16)
while content:
a = struct.unpack('IIII',content)
db['%d_%d' % (a[0],a[1])] = '%d_%d' % (a[2],a[3])
content = myfile.read(16)

db.close()
my_file.close()

That would be more memory efficient, as for speed you would need to
time it on your side.

Fredrik Lundh

1/4/2008 2:17:00 PM

0

wanzathe wrote:

> i have a binary file named test.dat including 9600000 records.
> the record format is int a + int b + int c + int d
> i want to build a dict like this: key=int a,int b values=int c,int d
> i choose using bsddb and it takes about 140 seconds to build the dict.

you're not building a dict, you're populating a persistent database.
storing ~70000 records per second isn't that bad, really...

> what can i do if i want to make my program run faster?
> or is there another way i can choose?

why not just use a real Python dictionary, and the marshal module for
serialization?

</F>

wanzathe

1/4/2008 2:55:00 PM

0

On 1?4?, ??10?17?, Fredrik Lundh <fred...@pythonware.com> wrote:
> wanzathe wrote:
> > i have a binary file named test.dat including 9600000 records.
> > the record format is int a + int b + int c + int d
> > i want to build a dict like this: key=int a,int b values=int c,int d
> > i choose using bsddb and it takes about 140 seconds to build the dict.
>
> you're not building a dict, you're populating a persistent database.
> storing ~70000 records per second isn't that bad, really...
>
> > what can i do if i want to make my program run faster?
> > or is there another way i can choose?
>
> why not just use a real Python dictionary, and the marshal module for
> serialization?
>
> </F>

hi,Fredrik Lundn
you are right, i'm populating a persistent database.
i plan to use a real Python dictionary and use cPickle for
serialization at first, but it did not work because the number of
records is too large.
Thanks