[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Fetching an URL using cookies

user@domain.invalid

9/12/2006 9:48:00 AM

I'm trying to fetch an url which needs several cookies to be set in
order to properly return a result.

I've found a page in the website from which I can get the session
cookies (instead of posting cookies set by myself I prefer use the ones
coming from the server)

So,

def http_get(url, url_before = nil)
headers = Hash.new()
headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)"
unless url_before.nil?
response = @http.get(url_before)
cookies = response.response['set-cookie']
headers['Cookie'] = cookies
end
response = @http.get(url, headers)
raise "url #{url} not accessible on host #{@host}:#{@port} - code
#{response.code}" if not ['200','302'].include?(response.code)
response.body
end


The problem is that I'm not sure it the way I repost the cookies is
right or not. The cookies retrieved by the unless block ARE OK but when
the second @http.get occurs, the remote web server ignore them and send
a redirect to a default page.

So, I need to be sure that I send cookies properly in the GET request
before investigating the cookies's content


Thanks for your help



Note : when rewiewing this post I think I should write some code to keep
the cookies's content between two calls (as a browser do) instead of
handling things the way I do. But it's juste a side note
5 Answers

William Crawford

9/12/2006 11:10:00 AM

0

Zouplaz wrote:
> I'm trying to fetch an url which needs several cookies to be set in
> order to properly return a result.
<snip/>
> The problem is that I'm not sure it the way I repost the cookies is
> right or not. The cookies retrieved by the unless block ARE OK but when
> the second @http.get occurs, the remote web server ignore them and send
> a redirect to a default page.

Why not -try- to manufacture them yourself and see if it works? If it
does, you know how to send them and can just make sure the
newly-obtained cookies are sent the same way. If it doesn't, massage it
until it does work.

> So, I need to be sure that I send cookies properly in the GET request
> before investigating the cookies's content

Right.

> Note : when rewiewing this post I think I should write some code to keep
> the cookies's content between two calls (as a browser do) instead of
> handling things the way I do. But it's juste a side note

Yes, good idea.

Another thought, however. Perhaps the page has additional requirements
that you haven't met. Cookies that don't exist on the other page, but
were set at login or somewhere else. Headers that you aren't sending
and it expects. A specific refering page. (Or something else I've
momentarily forgotten.)

--
Posted via http://www.ruby-....

user@domain.invalid

9/12/2006 11:50:00 AM

0

le 12/09/2006 13:09, William Crawford nous a dit:
>
> Another thought, however. Perhaps the page has additional requirements
> that you haven't met. Cookies that don't exist on the other page, but
> were set at login or somewhere else. Headers that you aren't sending
> and it expects. A specific refering page. (Or something else I've
> momentarily forgotten.)
>

I don't why but suddenly it worked... I presumed I've missed something
somewhere..

Now, I've rewritten the code and I use a "write-once" cookie mechanism
which is generic for every "scrapping" class that I use - It's
sufficient for now

def http_get(url)
headers = Hash.new()
headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.1)"
headers['Cookie'] = @cookies unless @cookies.nil?
response = @http.get(url, headers)
raise "url #{url} no access on host #{@host}:#{@port} - code
#{response.code}" if not ['200','302'].include?(response.code)
@cookies = response.response['set-cookie'] if @cookies.nil?
response.body
end

Just for my own education, could this code be rewriten in a more elegant
way ?


Thanks

William Crawford

9/12/2006 3:11:00 PM

0

Zouplaz wrote:
> le 12/09/2006 13:09, William Crawford nous a dit:
> I don't why but suddenly it worked... I presumed I've missed something
> somewhere..

Experience tells me it'll suddenly stop again, don't fret ;) When I
have something that stop and starts, I usually stop and gather the exact
information being sent, byte for byte, from a success and a failure and
compare it. Using LiveHttpHeaders (for firefox, here's an IE version
with a name something like it) you can grab the exact headers, cookies
(in the headers) and post data sent.

Personally, I would take the time to set up a test I know works, and if
it ever fails you again, you can run that test again and see what's
different now. I'd even go as far as to record the headers for the test
now, while it works, and save it for when it doesn't. (I'm not usually
so proactive, but this could be a serious bear to debug without
known-good headers/etc, and I'm lazy.)

> Just for my own education, could this code be rewriten in a more elegant
> way ?

I'm not one to talk to about 'elegant' code... I'm more in the 'Hey, it
works, right?' category. Hehe.

>
> Thanks

Glad it works! Enjoy.

--
Posted via http://www.ruby-....

Aaron Patterson

9/12/2006 4:38:00 PM

0

On Tue, Sep 12, 2006 at 06:51:40PM +0900, Zouplaz wrote:
> I'm trying to fetch an url which needs several cookies to be set in
> order to properly return a result.
>
> I've found a page in the website from which I can get the session
> cookies (instead of posting cookies set by myself I prefer use the ones
> coming from the server)
>
> So,
>
> def http_get(url, url_before = nil)
> headers = Hash.new()
> headers['User-agent'] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows
> NT 5.1)"
> unless url_before.nil?
> response = @http.get(url_before)
> cookies = response.response['set-cookie']
> headers['Cookie'] = cookies
> end
> response = @http.get(url, headers)
> raise "url #{url} not accessible on host #{@host}:#{@port} - code
> #{response.code}" if not ['200','302'].include?(response.code)
> response.body
> end
>
>
> The problem is that I'm not sure it the way I repost the cookies is
> right or not. The cookies retrieved by the unless block ARE OK but when
> the second @http.get occurs, the remote web server ignore them and send
> a redirect to a default page.
[snip]

Why write all this yourself? WWW::Mechanize will handle storing and
sending cookies for you. Then you can concentrate on getting the data
from the web page.

http://mechanize.ruby...

You can even set a custom user agent string! Hope that helps.
--
Aaron Patterson
http://tenderlovem...

user@domain.invalid

9/12/2006 9:04:00 PM

0

le 12/09/2006 18:38, Aaron Patterson nous a dit:

> Why write all this yourself? WWW::Mechanize will handle storing and
> sending cookies for you. Then you can concentrate on getting the data
> from the web page.
>
> http://mechanize.ruby...
>

Thanks for the link !