[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

parralel downloads

John Deas

3/8/2008 12:12:00 PM

Hi,

I would like to write a python script that will download a list of
files (mainly mp3s) from Internet. For this, I thought to use urllib,
with

urlopen("myUrl").read() and then writing the resulting string to a
file

my problem is that I would like to download several files at the time.
As I have not much experience in programming, could you point me the
easier ways to do this in python ?

Thanks,

JD
7 Answers

poof65

3/8/2008 4:33:00 PM

0

For your problem you have to use threads.

You can have more information here.
http://artfulcode.nfshost.com/files/multi-threading-in-p...


On Sat, Mar 8, 2008 at 1:11 PM, John Deas <john.deas@gmail.com> wrote:
> Hi,
>
> I would like to write a python script that will download a list of
> files (mainly mp3s) from Internet. For this, I thought to use urllib,
> with
>
> urlopen("myUrl").read() and then writing the resulting string to a
> file
>
> my problem is that I would like to download several files at the time.
> As I have not much experience in programming, could you point me the
> easier ways to do this in python ?
>
> Thanks,
>
> JD
> --
> http://mail.python.org/mailman/listinfo/p...
>

Gary Herron

3/8/2008 4:48:00 PM

0

poof65 wrote:
> For your problem you have to use threads.
>
Not at all true. Thread provide one way to solve this, but another is
the select function. For this simple case, select() may (or may not) be
easier to write. Pseudo-code would look something like this:

openSockets = list of sockets one per download file:
while openSockets:
readySockets = select(openSockets ...) # Identifies sockets with
data to be read
for each s in readSockets:
read from s and do whatever with the data
if s is at EOF: close and remove s from openSockets

That's it. Far easier than threads.

Gary Herron

> You can have more information here.
> http://artfulcode.nfshost.com/files/multi-threading-in-p...
>
>
> On Sat, Mar 8, 2008 at 1:11 PM, John Deas <john.deas@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to write a python script that will download a list of
>> files (mainly mp3s) from Internet. For this, I thought to use urllib,
>> with
>>
>> urlopen("myUrl").read() and then writing the resulting string to a
>> file
>>
>> my problem is that I would like to download several files at the time.
>> As I have not much experience in programming, could you point me the
>> easier ways to do this in python ?
>>
>> Thanks,
>>
>> JD
>> --
>> http://mail.python.org/mailman/listinfo/p...
>>
>>

John Deas

3/9/2008 12:25:00 PM

0

On Mar 8, 5:47 pm, Gary Herron <gher...@islandtraining.com> wrote:
> poof65 wrote:
> > For your problem you have to use threads.
>
> Not at all true. Thread provide one way to solve this, but another is
> the select function. For this simple case, select() may (or may not) be
> easier to write. Pseudo-code would look something like this:
>
> openSockets = list of sockets one per download file:
> while openSockets:
> readySockets = select(openSockets ...) # Identifies sockets with
> data to be read
> for each s in readSockets:
> read from s and do whatever with the data
> if s is at EOF: close and remove s from openSockets
>
> That's it. Far easier than threads.
>
> Gary Herron
>
> > You can have more information here.
> >http://artfulcode.nfshost.com/files/multi-threading-in-p...
>
> > On Sat, Mar 8, 2008 at 1:11 PM, John Deas <john.d...@gmail.com> wrote:
>
> >> Hi,
>
> >> I would like to write a python script that will download a list of
> >> files (mainly mp3s) from Internet. For this, I thought to use urllib,
> >> with
>
> >> urlopen("myUrl").read() and then writing the resulting string to a
> >> file
>
> >> my problem is that I would like to download several files at the time.
> >> As I have not much experience in programming, could you point me the
> >> easier ways to do this in python ?
>
> >> Thanks,
>
> >> JD
> >> --
> >> http://mail.python.org/mailman/listinfo/p...

Thank you both for your help. Threads are working for me. However, a
new problem for me is that the url I want to download are in an xml
file (I want to download podcasts), and is not the same as the file
downloaded:

http://www.sciam.com/podcast/podcast.mp3?e_id=86102326-0B1F-A3D4-74B2BBD61E9ECD2C&amp...

will be redirected to download:

http://podcast.sciam.com/daily/sa_d_podcast_...

is there a way, knowing the first url to get the second at runtime in
my script ?

John Deas

3/9/2008 1:06:00 PM

0

On Mar 9, 1:25 pm, John Deas <john.d...@gmail.com> wrote:
> On Mar 8, 5:47 pm, Gary Herron <gher...@islandtraining.com> wrote:
>
>
>
> > poof65 wrote:
> > > For your problem you have to use threads.
>
> > Not at all true. Thread provide one way to solve this, but another is
> > the select function. For this simple case, select() may (or may not) be
> > easier to write. Pseudo-code would look something like this:
>
> > openSockets = list of sockets one per download file:
> > while openSockets:
> > readySockets = select(openSockets ...) # Identifies sockets with
> > data to be read
> > for each s in readSockets:
> > read from s and do whatever with the data
> > if s is at EOF: close and remove s from openSockets
>
> > That's it. Far easier than threads.
>
> > Gary Herron
>
> > > You can have more information here.
> > >http://artfulcode.nfshost.com/files/multi-threading-in-p...
>
> > > On Sat, Mar 8, 2008 at 1:11 PM, John Deas <john.d...@gmail.com> wrote:
>
> > >> Hi,
>
> > >> I would like to write a python script that will download a list of
> > >> files (mainly mp3s) from Internet. For this, I thought to use urllib,
> > >> with
>
> > >> urlopen("myUrl").read() and then writing the resulting string to a
> > >> file
>
> > >> my problem is that I would like to download several files at the time.
> > >> As I have not much experience in programming, could you point me the
> > >> easier ways to do this in python ?
>
> > >> Thanks,
>
> > >> JD
> > >> --
> > >> http://mail.python.org/mailman/listinfo/p...
>
> Thank you both for your help. Threads are working for me. However, a
> new problem for me is that the url I want to download are in an xml
> file (I want to download podcasts), and is not the same as the file
> downloaded:
>
> http://www.sciam.com/podcast/podcast.mp3?e_id=86102326-0B1F......
>
> will be redirected to download:
>
> http://podcast.sciam.com/daily/sa_d_podcast_...
>
> is there a way, knowing the first url to get the second at runtime in
> my script ?

Found it: geturl() does the job

Aaron Brady

3/9/2008 9:04:00 PM

0

> > > >>  my problem is that I would like to download several files at the time.
> > > >>  As I have not much experience in programming, could you point me the
> > > >>  easier ways to do this in python ?
>
> > Thank you both for your help. Threads are working for me. However, a
> > new problem for me is that the url I want to download are in an xml
> > file (I want to download podcasts), and is not the same as the file
> > downloaded:
>
> >http://www.sciam.com/podcast/podcast.mp3?e_id=86102326-0B1F......
>
> > will be redirected to download:
>
> >http://podcast.sciam.com/daily/sa_d_podcast_...
>
> > is there a way, knowing the first url to get the second at runtime in
> > my script ?
>
> Found it: geturl() does the job

That's for normalizing schemes. I believe you subclass FancyURLopener
and override the read method.

Gabriel Genellina

3/9/2008 9:11:00 PM

0

En Sat, 08 Mar 2008 14:47:45 -0200, Gary Herron
<gherron@islandtraining.com> escribi�:

> poof65 wrote:
>> For your problem you have to use threads.
>>
> Not at all true. Thread provide one way to solve this, but another is
> the select function. For this simple case, select() may (or may not) be
> easier to write. Pseudo-code would look something like this:
>
> openSockets = list of sockets one per download file:
> while openSockets:
> readySockets = select(openSockets ...) # Identifies sockets with
> data to be read
> for each s in readSockets:
> read from s and do whatever with the data
> if s is at EOF: close and remove s from openSockets
>
> That's it. Far easier than threads.

Easier? If you omit all the relevant details, yes, looks easy. For
example, you read some data from one socket, part of the file you're
downloading. Where do you write it? You require additional structures to
keep track of things.
Pseudocode for the threaded version, complete with socket creation:

def downloadfile(url, fn):
s = create socket for url
f = open filename for writing
shutil.copyfileobj(s.makefile(), f)

for each url, filename to retrieve:
t = threading.Thread(target=downloadfile, args=(url,filename))
add t to threadlist
t.start()

for each t in threadlist:
t.join()

The downloadfile function looks simpler to me - it's what anyone would
write in a single threaded program, with local variables and keeping full
state.
The above pseudocode can be converted directly into Python code - no more
structures nor code are required.

Of course, don't try to download a million files at the same time -
neither a million sockets nor a million threads would work.

--
Gabriel Genellina

Aaron Brady

3/10/2008 12:48:00 AM

0

> > That's it.  Far easier than threads.

I'll order a 'easyness' metric from the warehouse. Of course,
resources are parameters to the metric, such as facility given lots of
time, facility given lots of libraries, facility given hot shots, &c.

> Easier? If you omit all the relevant details, yes, looks easy. For  
>
> def downloadfile(url, fn):
>    s = create socket for url
>    f = open filename for writing
>    shutil.copyfileobj(s.makefile(), f)
>
> for each url, filename to retrieve:
[ threadlist.addandstart( threading.Thread(target=downloadfile,
args=(url,filename)) ) ]
>
[ threadlist.joineach() ]

> Of course, don't try to download a million files at the same time -  
> neither a million sockets nor a million threads would work.

Dammit! Then what's my million-core machine for? If architectures
"have reached the point of diminishing returns" ( off py.ideas ), then
what's the PoDR for numbers of threads per core?

Answer: One. Just write data structures and don't swap context. But
when do you want it by? What is the PoDR for amount of effort per
clock cycle saved? Get a Frank and a Brit and ask them what language
is easiest to speak.

(Answer: Math. Har *plonk*.)