[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

best(fastest) way to send and get lists from files

Abrahams, Max

1/31/2008 7:35:00 PM


I've looked into pickle, dump, load, save, readlines(), etc.

Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.

thanks
3 Answers

Yu-Xi Lim

1/31/2008 9:50:00 PM

0

Abrahams, Max wrote:
> I've looked into pickle, dump, load, save, readlines(), etc.
>
> Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
>
> Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.
>
> thanks

1) Why don't you time them with the timeit module?
http://docs.python.org/lib/module-t...

Results will vary with the specific data you have, and your hardware
speed, but if it's a lot of data, it's most likely going to be the
latter that's the bottleneck. A compact binary format will help
alleviate this.

If you're reading a lot of data into memory, you might have to deal with
your OS swap/virtual memory.

2) "Best" depends on what your data is and what you're doing with it.

Are you reinventing a flat-file database? There are better solutions for
databases.

If you're just reformatting data to pass to another program, say, for
scientific computation, the portability may be more of an issue. Number
crunching the resultant data may be even more time consuming such that
the time spent writing/reading it becomes insignificant.

Paddy

2/1/2008 5:28:00 AM

0

On Jan 31, 7:34 pm, "Abrahams, Max" <Max_Abrah...@brown.edu> wrote:
> I've looked into pickle, dump, load, save, readlines(), etc
I've used the following sometimes:

from pprint import pprint as pp
print "data = \\"
pp(data)

That created a python file that could be read as a module, but there
are limitations on the __repr__ of the data.

- Paddy.
P.S. I never timed it - it was fast enough, and the data was readable.

Nick Craig-Wood

2/5/2008 3:30:00 PM

0

Abrahams, Max <Max_Abrahams@brown.edu> wrote:
>
> I've looked into pickle, dump, load, save, readlines(), etc.
>
> Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
>
> Binary and text files are both okay, text would be preferred in
> general unless there's a significant speed boost from something
> binary.

You could try the marshal module which is very vast, lightweight and
built in.

http://www.python.org/doc/current/lib/module-ma...

It makes a binary format though, and it will only dump "simple"
objects - see the page above. It is what python uses internally to
make .pyc files from .py I believe.

------------------------------------------------------------
#!/usr/bin/python

import os
from marshal import dump, load
from timeit import Timer

def write(N, file_name = "z.marshal"):
L = range(N)
out = open(file_name, "wb")
dump(L, out)
out.close()
print "Written %d bytes for list size %d" % (os.path.getsize(file_name), N)

def read(N):
inp = open("z.marshal", "rb")
L = load(inp)
inp.close()
assert len(L) == N

for log_N in range(7):
N = 10**log_N
loops = 10
write(N)
print "Read back %d items in" % N, Timer("read(%d)" % N, "from __main__ import read").repeat(1, loops)[0]/loops, "s"
------------------------------------------------------------

Produces

$ ./test-marshal.py
Written 10 bytes for list size 1
Read back 1 items in 4.14133071899e-05 s
Written 55 bytes for list size 10
Read back 10 items in 4.31060791016e-05 s
Written 505 bytes for list size 100
Read back 100 items in 8.23020935059e-05 s
Written 5005 bytes for list size 1000
Read back 1000 items in 0.000352478027344 s
Written 50005 bytes for list size 10000
Read back 10000 items in 0.00165479183197 s
Written 500005 bytes for list size 100000
Read back 100000 items in 0.0175776958466 s
Written 5000005 bytes for list size 1000000
Read back 1000000 items in 0.175704598427 s

--
Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-woo...