Asp Forum - n00b with urllib2: How to make it handle cookie automatically?

est

2/22/2008 6:51:00 AM

Hi all,

I need urllib2 do perform series of HTTP requests with cookie from
PREVIOUS request(like our browsers usually do ). Many people suggest I
use some library(e.g. pycURL) instead but I guess it's good practise
for a python beginner to DIY something rather than use existing tools.

So my problem is how to expand the urllib2 class

from cookielib import CookieJar
class SmartRequest():
cj=CookieJar()
def __init__(self, strUrl, strContent=None):
self.Request = urllib2.Request(strUrl, strContent)
self.cj.add_cookie_header(self.Request)
self.Response = urllib2.urlopen(Request)
self.cj.extract_cookies(self.Response, self.Request)
def url
def read(self, intCount):
return self.Response.read(intCount)
def headers(self, strHeaderName):
return self.Response.headers[strHeaderName]

The code does not work because each time SmartRequest is initiated,
object 'cj' is cleared. How to avoid that?
The only stupid solution I figured out is use a global CookieJar
object. Is there anyway that could handle all this INSIDE the class?

I am totally new to OOP & python programming, so could anyone give me
some suggestions? Thanks in advance

10 Answers

Rob Wolfe

2/22/2008 6:43:00 PM

est <electronixtar@gmail.com> writes:

> Hi all,
>
> I need urllib2 do perform series of HTTP requests with cookie from
> PREVIOUS request(like our browsers usually do ). Many people suggest I
> use some library(e.g. pycURL) instead but I guess it's good practise
> for a python beginner to DIY something rather than use existing tools.
>
> So my problem is how to expand the urllib2 class
>
> from cookielib import CookieJar
> class SmartRequest():
> cj=CookieJar()
> def __init__(self, strUrl, strContent=None):
> self.Request = urllib2.Request(strUrl, strContent)
> self.cj.add_cookie_header(self.Request)
> self.Response = urllib2.urlopen(Request)
> self.cj.extract_cookies(self.Response, self.Request)
> def url
> def read(self, intCount):
> return self.Response.read(intCount)
> def headers(self, strHeaderName):
> return self.Response.headers[strHeaderName]
>
> The code does not work because each time SmartRequest is initiated,
> object 'cj' is cleared. How to avoid that?
> The only stupid solution I figured out is use a global CookieJar
> object. Is there anyway that could handle all this INSIDE the class?
>
> I am totally new to OOP & python programming, so could anyone give me
> some suggestions? Thanks in advance

Google for urllib2.HTTPCookieProcessor.

HTH,
Rob

Dennis Lee Bieber

2/22/2008 7:02:00 PM

On Thu, 21 Feb 2008 22:50:49 -0800 (PST), est <electronixtar@gmail.com>
declaimed the following in comp.lang.python:

<snip>
>
> from cookielib import CookieJar
> class SmartRequest():
> cj=CookieJar()
> def __init__(self, strUrl, strContent=None):
> self.Request = urllib2.Request(strUrl, strContent)
> self.cj.add_cookie_header(self.Request)
> self.Response = urllib2.urlopen(Request)
> self.cj.extract_cookies(self.Response, self.Request)
> def url
> def read(self, intCount):
> return self.Response.read(intCount)
> def headers(self, strHeaderName):
> return self.Response.headers[strHeaderName]
>
> The code does not work because each time SmartRequest is initiated,
> object 'cj' is cleared. How to avoid that?

Well... maybe by not creating new SmartRequest instances, but reuse
the one instance for the transaction.

UNTESTED -- this is a mental exercise only:

class SmartTransaction(object): #new style class
def __init__(self):
self.cj = CookieJar()
def doRequest(self, URL, Content=None): #python names are untyped
#objects have types
#so it is rare to see
# <type>Name forms
self.request = urllib2.Request(URL, Content)
self.cj.add_cookie_header(self.request)
self.response = urllib.urlopen(self.request)
self.cj.extract_cookies(self.response, self.request)

myTransaction = SmartTransaction()
myTransaction.doRequest(aURL)
myTransaction.doRequest(aFollowUpURL, someContent)
....
--
Wulfraed Dennis Lee Bieber KD6MOG
wlfraed@ix.netcom.com wulfraed@bestiaria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: web-asst@bestiaria.com)
HTTP://www.bestiaria.com/

7stud --

2/22/2008 9:32:00 PM

On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
> Hi all,
>
> I need urllib2 do perform series of HTTP requests with cookie from
> PREVIOUS request(like our browsers usually do ). Many people suggest I
> use some library(e.g. pycURL) instead but I guess it's good practise
> for a python beginner to DIY something rather than use existing tools.
>
> So my problem is how to expand the urllib2 class
>
> from cookielib import CookieJar
> class SmartRequest():
> cj=CookieJar()
> def __init__(self, strUrl, strContent=None):
> self.Request = urllib2.Request(strUrl, strContent)
> self.cj.add_cookie_header(self.Request)
> self.Response = urllib2.urlopen(Request)
> self.cj.extract_cookies(self.Response, self.Request)
> def url
> def read(self, intCount):
> return self.Response.read(intCount)
> def headers(self, strHeaderName):
> return self.Response.headers[strHeaderName]
>
> The code does not work because each time SmartRequest is initiated,
> object 'cj' is cleared.

That's because every time you create a SmartRequest, this line
executes:

cj=CookieJar()

That creates a new, *empty* cookie jar, i.e. it has no knowledge of
any previously set cookies.

> How to avoid that?

If you read the docs on the cookielib module, and in particular
CookieJar objects, you will notice that CookieJar objects are
described in a section that is titled: CookieJar and FileCookieJar
Objects.

Hmm...I wonder what the difference is between a CookieJar object and a
FileCookieJar Object?

----------
FileCookieJar implements the following additional methods:

save(filename=None, ignore_discard=False, ignore_expires=False)
Save cookies to a file.

load(filename=None, ignore_discard=False, ignore_expires=False)
Load cookies from a file.
--------

That seems promising.

7stud --

2/22/2008 9:57:00 PM

On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
> Hi all,
>
> I need urllib2 do perform series of HTTP requests with cookie from
> PREVIOUS request(like our browsers usually do ).
>

Cookies from a previous request made in the currently running
program? Or cookies from requests that were made when you previously
ran the program?

>
> from cookielib import CookieJar
> class SmartRequest():
> cj=CookieJar()
> def __init__(self, strUrl, strContent=None):
> self.Request = urllib2.Request(strUrl, strContent)
> self.cj.add_cookie_header(self.Request)
> self.Response = urllib2.urlopen(Request)
> self.cj.extract_cookies(self.Response, self.Request)
> def url
> def read(self, intCount):
> return self.Response.read(intCount)
> def headers(self, strHeaderName):
> return self.Response.headers[strHeaderName]
>
> The code does not work because each time SmartRequest is initiated,
> object 'cj' is cleared. How to avoid that?
> The only stupid solution I figured out is use a global CookieJar
> object. Is there anyway that could handle all this INSIDE the class?
>

Examine this code and its output:

class SmartRequest(object):
def __init__(self, id):
if not getattr(SmartRequest, 'cj', None):
SmartRequest.cj = "I'm a cookie jar. Created by request:
%s" % id

r1 = SmartRequest(1)
r2 = SmartRequest(2)

print r1.cj
print r2.cj

--output:--
I'm a cookie jar. Created by request: 1
I'm a cookie jar. Created by request: 1

7stud --

2/23/2008 6:06:00 AM

On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
>
> class SmartRequest():
>

You should always define a class like this:

class SmartRequest(object):

unless you know of a specific reason not to.

Steve Holden

2/23/2008 1:23:00 PM

7stud wrote:
> On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
>> class SmartRequest():
>>
>
> You should always define a class like this:
>
> class SmartRequest(object):
>
>
> unless you know of a specific reason not to.
>
>
It's much easier, though, just to put

__metaclass__ = type

at the start of any module where you want exlusively new-style objects.
And I do agree that you should use exclusively new-style objects without
a good reason for not doing, though thanks to Guido's hard work it
mostly doesn't matter.

$ cat test94.py
__metaclass__ = type

class Rhubarb:
pass

rhubarb = Rhubarb()

print type(Rhubarb)
print type(rhubarb)

$ python test94.py
<type 'type'>
<class '__main__.Rhubarb'>

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.hold...

est

2/24/2008 11:02:00 AM

On Feb 23, 5:57 am, 7stud <bbxx789_0...@yahoo.com> wrote:
> On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
>
> > Hi all,
>
> > I need urllib2 do perform series of HTTP requests with cookie from
> > PREVIOUS request(like our browsers usually do ).
>
> Cookies from a previous request made in the currently running
> program? Or cookies from requests that were made when you previously
> ran the program?
>
>
>
>
>
>
>
> > from cookielib import CookieJar
> > class SmartRequest():
> > cj=CookieJar()
> > def __init__(self, strUrl, strContent=None):
> > self.Request = urllib2.Request(strUrl, strContent)
> > self.cj.add_cookie_header(self.Request)
> > self.Response = urllib2.urlopen(Request)
> > self.cj.extract_cookies(self.Response, self.Request)
> > def url
> > def read(self, intCount):
> > return self.Response.read(intCount)
> > def headers(self, strHeaderName):
> > return self.Response.headers[strHeaderName]
>
> > The code does not work because each time SmartRequest is initiated,
> > object 'cj' is cleared. How to avoid that?
> > The only stupid solution I figured out is use a global CookieJar
> > object. Is there anyway that could handle all this INSIDE the class?
>
> Examine this code and its output:
>
> class SmartRequest(object):
> def __init__(self, id):
> if not getattr(SmartRequest, 'cj', None):
> SmartRequest.cj = "I'm a cookie jar. Created by request:

the getattr method is exactly what I am looking for, thanks!

On Feb 23, 2:05 pm, 7stud <bbxx789_0...@yahoo.com> wrote:
> On Feb 21, 11:50 pm, est <electronix...@gmail.com> wrote:
>
>
>
> > class SmartRequest():
>
> You should always define a class like this:
>
> class SmartRequest(object):
>
> unless you know of a specific reason not to.

Thanks for the advice!

est

2/24/2008 11:41:00 AM

On Feb 23, 2:42 am, Rob Wolfe <r...@smsnet.pl> wrote:
> est <electronix...@gmail.com> writes:
> > Hi all,
>
> > I need urllib2 do perform series of HTTP requests with cookie from
> > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > use some library(e.g. pycURL) instead but I guess it's good practise
> > for a python beginner to DIY something rather than use existing tools.
>
> > So my problem is how to expand the urllib2 class
>
> > from cookielib import CookieJar
> > class SmartRequest():
> > cj=CookieJar()
> > def __init__(self, strUrl, strContent=None):
> > self.Request = urllib2.Request(strUrl, strContent)
> > self.cj.add_cookie_header(self.Request)
> > self.Response = urllib2.urlopen(Request)
> > self.cj.extract_cookies(self.Response, self.Request)
> > def url
> > def read(self, intCount):
> > return self.Response.read(intCount)
> > def headers(self, strHeaderName):
> > return self.Response.headers[strHeaderName]
>
> > The code does not work because each time SmartRequest is initiated,
> > object 'cj' is cleared. How to avoid that?
> > The only stupid solution I figured out is use a global CookieJar
> > object. Is there anyway that could handle all this INSIDE the class?
>
> > I am totally new to OOP & python programming, so could anyone give me
> > some suggestions? Thanks in advance
>
> Google for urllib2.HTTPCookieProcessor.
>
> HTH,
> Rob- Hide quoted text -
>
> - Show quoted text -

Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
solved this problem by the following code.

class HTTPRefererProcessor(urllib2.BaseHandler):
"""Add Referer header to requests.

This only makes sense if you use each RefererProcessor for a
single
chain of requests only (so, for example, if you use a single
HTTPRefererProcessor to fetch a series of URLs extracted from a
single
page, this will break).

There's a proper implementation of this in module mechanize.

"""
def __init__(self):
self.referer = None

def http_request(self, request):
if ((self.referer is not None) and
not request.has_header("Referer")):
request.add_unredirected_header("Referer", self.referer)
return request

def http_response(self, request, response):
self.referer = response.geturl()
return response

https_request = http_request
https_response = http_response

def main():
cj = CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPCookieProcessor(cj),
HTTPRefererProcessor(),
)
urllib2.install_opener(opener)

urllib2.urlopen(url1)
urllib2.urlopen(url2)

if "__main__" == __name__:
main()

And it's working great!

Once again, thanks everyone!

7stud --

2/24/2008 9:46:00 PM

On Feb 24, 4:41 am, est <electronix...@gmail.com> wrote:
> On Feb 23, 2:42 am, Rob Wolfe <r...@smsnet.pl> wrote:
>
>
>
> > est <electronix...@gmail.com> writes:
> > > Hi all,
>
> > > I need urllib2 do perform series of HTTP requests with cookie from
> > > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > > use some library(e.g. pycURL) instead but I guess it's good practise
> > > for a python beginner to DIY something rather than use existing tools.
>
> > > So my problem is how to expand the urllib2 class
>
> > > from cookielib import CookieJar
> > > class SmartRequest():
> > > cj=CookieJar()
> > > def __init__(self, strUrl, strContent=None):
> > > self.Request = urllib2.Request(strUrl, strContent)
> > > self.cj.add_cookie_header(self.Request)
> > > self.Response = urllib2.urlopen(Request)
> > > self.cj.extract_cookies(self.Response, self.Request)
> > > def url
> > > def read(self, intCount):
> > > return self.Response.read(intCount)
> > > def headers(self, strHeaderName):
> > > return self.Response.headers[strHeaderName]
>
> > > The code does not work because each time SmartRequest is initiated,
> > > object 'cj' is cleared. How to avoid that?
> > > The only stupid solution I figured out is use a global CookieJar
> > > object. Is there anyway that could handle all this INSIDE the class?
>
> > > I am totally new to OOP & python programming, so could anyone give me
> > > some suggestions? Thanks in advance
>
> > Google for urllib2.HTTPCookieProcessor.
>
> > HTH,
> > Rob- Hide quoted text -
>
> > - Show quoted text -
>
> Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
> solved this problem by the following code.
>
> class HTTPRefererProcessor(urllib2.BaseHandler):
> """Add Referer header to requests.
>
> This only makes sense if you use each RefererProcessor for a
> single
> chain of requests only (so, for example, if you use a single
> HTTPRefererProcessor to fetch a series of URLs extracted from a
> single
> page, this will break).
>
> There's a proper implementation of this in module mechanize.
>
> """
> def __init__(self):
> self.referer = None
>
> def http_request(self, request):
> if ((self.referer is not None) and
> not request.has_header("Referer")):
> request.add_unredirected_header("Referer", self.referer)
> return request
>
> def http_response(self, request, response):
> self.referer = response.geturl()
> return response
>
> https_request = http_request
> https_response = http_response
>
> def main():
> cj = CookieJar()
> opener = urllib2.build_opener(
> urllib2.HTTPCookieProcessor(cj),
> HTTPRefererProcessor(),
> )
> urllib2.install_opener(opener)
>
> urllib2.urlopen(url1)
> urllib2.urlopen(url2)
>
> if "__main__" == __name__:
> main()
>
> And it's working great!
>
> Once again, thanks everyone!

How does the class HTTPReferrerProcessor do anything useful for you?

est

2/25/2008 4:23:00 AM

On Feb 25, 5:46 am, 7stud <bbxx789_0...@yahoo.com> wrote:
> On Feb 24, 4:41 am, est <electronix...@gmail.com> wrote:
>
>
>
>
>
> > On Feb 23, 2:42 am, Rob Wolfe <r...@smsnet.pl> wrote:
>
> > > est <electronix...@gmail.com> writes:
> > > > Hi all,
>
> > > > I need urllib2 do perform series of HTTP requests with cookie from
> > > > PREVIOUS request(like our browsers usually do ). Many people suggest I
> > > > use some library(e.g. pycURL) instead but I guess it's good practise
> > > > for a python beginner to DIY something rather than use existing tools.
>
> > > > So my problem is how to expand the urllib2 class
>
> > > > from cookielib import CookieJar
> > > > class SmartRequest():
> > > > cj=CookieJar()
> > > > def __init__(self, strUrl, strContent=None):
> > > > self.Request = urllib2.Request(strUrl, strContent)
> > > > self.cj.add_cookie_header(self.Request)
> > > > self.Response = urllib2.urlopen(Request)
> > > > self.cj.extract_cookies(self.Response, self.Request)
> > > > def url
> > > > def read(self, intCount):
> > > > return self.Response.read(intCount)
> > > > def headers(self, strHeaderName):
> > > > return self.Response.headers[strHeaderName]
>
> > > > The code does not work because each time SmartRequest is initiated,
> > > > object 'cj' is cleared. How to avoid that?
> > > > The only stupid solution I figured out is use a global CookieJar
> > > > object. Is there anyway that could handle all this INSIDE the class?
>
> > > > I am totally new to OOP & python programming, so could anyone give me
> > > > some suggestions? Thanks in advance
>
> > > Google for urllib2.HTTPCookieProcessor.
>
> > > HTH,
> > > Rob- Hide quoted text -
>
> > > - Show quoted text -
>
> > Wow, thank you Rob Wolfe! Your reply is shortest yet most helpful! I
> > solved this problem by the following code.
>
> > class HTTPRefererProcessor(urllib2.BaseHandler):
> > """Add Referer header to requests.
>
> > This only makes sense if you use each RefererProcessor for a
> > single
> > chain of requests only (so, for example, if you use a single
> > HTTPRefererProcessor to fetch a series of URLs extracted from a
> > single
> > page, this will break).
>
> > There's a proper implementation of this in module mechanize.
>
> > """
> > def __init__(self):
> > self.referer = None
>
> > def http_request(self, request):
> > if ((self.referer is not None) and
> > not request.has_header("Referer")):
> > request.add_unredirected_header("Referer", self.referer)
> > return request
>
> > def http_response(self, request, response):
> > self.referer = response.geturl()
> > return response
>
> > https_request = http_request
> > https_response = http_response
>
> > def main():
> > cj = CookieJar()
> > opener = urllib2.build_opener(
> > urllib2.HTTPCookieProcessor(cj),
> > HTTPRefererProcessor(),
> > )
> > urllib2.install_opener(opener)
>
> > urllib2.urlopen(url1)
> > urllib2.urlopen(url2)
>
> > if "__main__" == __name__:
> > main()
>
> > And it's working great!
>
> > Once again, thanks everyone!
>
> How does the class HTTPReferrerProcessor do anything useful for you?- Hide quoted text -
>
> - Show quoted text -

Well, it's more browser-like. Many be I should have snipped
HTTPReferrerProcessor code for this discussion.

comp.lang.python

n00b with urllib2: How to make it handle cookie automatically?

est

Rob Wolfe

Dennis Lee Bieber

7stud --

7stud --

7stud --

Steve Holden

est

est

7stud --

est

x Login to ForumsZone