[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Stuck in a Redirect Loop While Crawling

Matt White

6/12/2007 6:38:00 PM

Hello,

I am writing a crawler in Ruby to crawl websites. One of the sites I
crawl is very picky about headers so I am mimicking my FireFox browser
as closely as possible. One of the GETs I make to this site results in
a redirect response. I take the 'location' field from the redirect
header and go there. When FireFox sends its GET to this location, it
gets a 200 OK response. However, I keep getting redirected every time.

Here is what FireFox is sending:

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.8.1.4) Gecko/20070515 Firefox/2.0.0.4
Keep-Alive: 300
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Language: en-us,en;q=0.5
Cookie: sessionid=6d7dd6277ec64983bf642760d7d77d6a
Connection: keep-alive
Accept: text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Host: <hostname here>

And here is how the server responds to FireFox:

HTTP/1.x 200 OK
Date: Tue, 12 Jun 2007 17:30:20 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Cache-Control: private
Expires: Tue, 12 Jun 2007 17:29:18 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 81118

I am sending this exact same header using Ruby's Net::HTTP.get method:

server = Net::HTTP.new(uri.host, uri.port)
response,data = server.get(uri.request_uri, headers)

where headers is a hash with the exact same keys and values as the
FireFox headers above (the cookie value differs, of course, as that is
retrieved and stored dynamically). But I always get redirected to the
exact same URL that I just GETed. This is the response I get:

RESPONSE: #<Net::HTTPFound:0x300c604>
Printing Response:

cache-control: private
expires: Tue, 12 Jun 2007 18:17:26 GMT
x-aspnet-version: 1.1.4322
content-type: text/html; charset=utf-8
x-powered-by: ASP.NET
date: Tue, 12 Jun 2007 18:18:26 GMT
microsoftofficewebserver: 5.0_Pub
server: Microsoft-IIS/6.0
content-length: 200
location: <exact same URL I just GETed>

Can anyone enlighten me as to what I am doing differently that the
site redirects me to the same place? I can't tell if it's something
I'm doing wrong or something Ruby is doing that is not the same as
what FireFox is doing. Thanks.