[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

urllib2 rate limiting

Dimitrios Apostolou

1/10/2008 5:18:00 PM

6 Answers

Rob Wolfe

1/10/2008 6:28:00 PM

0

Dimitrios Apostolou <jimis@gmx.net> writes:

> P.S. And something simpler: How can I disallow urllib2 to follow
> redirections to foreign hosts?

You need to subclass `urllib2.HTTPRedirectHandler`, override
`http_error_301` and `http_error_302` methods and throw
`urllib2.HTTPError` exception.

http://diveintopython.org/http_web_services/redi...

HTH,
Rob

Dimitrios Apostolou

1/10/2008 8:16:00 PM

0

Rob Wolfe

1/10/2008 8:43:00 PM

0

Dimitrios Apostolou <jimis@gmx.net> writes:

> On Thu, 10 Jan 2008, Rob Wolfe wrote:
>
>> Dimitrios Apostolou <jimis@gmx.net> writes:
>>
>>> P.S. And something simpler: How can I disallow urllib2 to follow
>>> redirections to foreign hosts?
>>
>> You need to subclass `urllib2.HTTPRedirectHandler`, override
>> `http_error_301` and `http_error_302` methods and throw
>> `urllib2.HTTPError` exception.
>
> Thanks! I think for my case it's better to override redirect_request
> method, and return a Request only in case the redirection goes to the
> same site. Just another question, because I can't find in the docs the
> meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I
> get redirected to (the 'Location:' HTTP header?), should I check the
> hdrs parameter or there is a better way?

Well, according to the documentation there is no better way.
But I looked into the source code of `urllib2` and it seems
that `redirect_request` method takes one more parameter
`newurl`, what is probably what you're looking for. ;)

Regards,
Rob

Dimitrios Apostolou

1/10/2008 9:13:00 PM

0

On Thursday 10 January 2008 22:42:44 Rob Wolfe wrote:
> Dimitrios Apostolou <jimis@gmx.net> writes:
> > On Thu, 10 Jan 2008, Rob Wolfe wrote:
> >> Dimitrios Apostolou <jimis@gmx.net> writes:
> >>> P.S. And something simpler: How can I disallow urllib2 to follow
> >>> redirections to foreign hosts?
> >>
> >> You need to subclass `urllib2.HTTPRedirectHandler`, override
> >> `http_error_301` and `http_error_302` methods and throw
> >> `urllib2.HTTPError` exception.
> >
> > Thanks! I think for my case it's better to override redirect_request
> > method, and return a Request only in case the redirection goes to the
> > same site. Just another question, because I can't find in the docs the
> > meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I
> > get redirected to (the 'Location:' HTTP header?), should I check the
> > hdrs parameter or there is a better way?
>
> Well, according to the documentation there is no better way.
> But I looked into the source code of `urllib2` and it seems
> that `redirect_request` method takes one more parameter
> `newurl`, what is probably what you're looking for. ;)
>
> Regards,
> Rob

Cool! :-) Sometimes undocumented features provide superb solutions... I wonder
if there is something similar for rate limiting :-s


Thank you,
Dimitris

Nick Craig-Wood

1/11/2008 9:30:00 AM

0

Dimitrios Apostolou <jimis@gmx.net> wrote:
> I want to limit the download speed when using urllib2. In particular,
> having several parallel downloads, I want to make sure that their total
> speed doesn't exceed a maximum value.
>
> I can't find a simple way to achieve this. After researching a can try
> some things but I'm stuck on the details:
>
> 1) Can I overload some method in _socket.py to achieve this, and perhaps
> make this generic enough to work even with other libraries than urllib2?
>
> 2) There is the urllib.urlretrieve() function which accepts a reporthook
> parameter.

Here is an implementation based on that idea. I've used urllib rather
than urllib2 as that is what I'm familiar with.

------------------------------------------------------------
#!/usr/bin/python

"""
Fetch a url rate limited

Syntax: rate URL local_file_name
"""

import os
import sys
import urllib
from time import time, sleep

class RateLimit(object):
"""Rate limit a url fetch"""
def __init__(self, rate_limit):
"""rate limit in kBytes / second"""
self.rate_limit = rate_limit
self.start = time()
def __call__(self, block_count, block_size, total_size):
total_kb = total_size / 1024
downloaded_kb = (block_count * block_size) / 1024
elapsed_time = time() - self.start
if elapsed_time != 0:
rate = downloaded_kb / elapsed_time
print "%d kb of %d kb downloaded %f.1 kBytes/s\n" % (downloaded_kb ,total_kb, rate),
expected_time = downloaded_kb / self.rate_limit
sleep_time = expected_time - elapsed_time
print "Sleep for", sleep_time
if sleep_time > 0:
sleep(sleep_time)

def main():
"""Fetch the contents of urls"""
if len(sys.argv) != 4:
print 'Syntax: %s "rate in kBytes/s" URL "local output path"' % sys.argv[0]
raise SystemExit(1)
rate_limit, url, out_path = sys.argv[1:]
rate_limit = float(rate_limit)
print "Fetching %r to %r with rate limit %.1f" % (url, out_path, rate_limit)
urllib.urlretrieve(url, out_path, reporthook=RateLimit(rate_limit))

if __name__ == "__main__": main()
------------------------------------------------------------

Use it like this

$ ./rate-limited-fetch.py 16 http://some/url/or/other z
Fetching 'http://some/url/or/other' to 'z' with rate limit 16.0
0 kb of 10118 kb downloaded 0.000000.1 kBytes/s
Sleep for -0.0477550029755
8 kb of 10118 kb downloaded 142.073242.1 kBytes/s
Sleep for 0.443691015244
16 kb of 10118 kb downloaded 32.130966.1 kBytes/s
Sleep for 0.502038002014
24 kb of 10118 kb downloaded 23.952789.1 kBytes/s
Sleep for 0.498028993607
32 kb of 10118 kb downloaded 21.304672.1 kBytes/s
Sleep for 0.497982025146
40 kb of 10118 kb downloaded 19.979510.1 kBytes/s
Sleep for 0.497948884964
48 kb of 10118 kb downloaded 19.184721.1 kBytes/s
Sleep for 0.498008966446
....
1416 kb of 10118 kb downloaded 16.090774.1 kBytes/s
Sleep for 0.499262094498
1424 kb of 10118 kb downloaded 16.090267.1 kBytes/s
Sleep for 0.499293088913
1432 kb of 10118 kb downloaded 16.089760.1 kBytes/s
Sleep for 0.499292135239
1440 kb of 10118 kb downloaded 16.089254.1 kBytes/s
Sleep for 0.499267101288
....


--
Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-woo...

Dimitrios Apostolou

1/12/2008 3:54:00 PM

0