[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Using an existing session to grab a page

Chad Layton

12/25/2005 10:47:00 PM

I'm rather new to both web programming and ruby so forgive me if my
question is ill formed.

I'm trying to do some screen scraping on a website that requires a login
. What I would like to have happen is for the user to login to the
website normally, then run my script which uses the existing login
session to grab the page and do whatever to it.

To illustrate my problem: If I use
Net::HTTP.get_response(URI.parse("http://foo.bar/baz...)).body, then
it serves up the index asking for a login. How do I get contents of
baz.php?
4 Answers

james_b

12/25/2005 11:37:00 PM

0

Chad Layton wrote:
> I'm rather new to both web programming and ruby so forgive me if my
> question is ill formed.
>
> I'm trying to do some screen scraping on a website that requires a login
> . What I would like to have happen is for the user to login to the
> website normally, then run my script which uses the existing login
> session to grab the page and do whatever to it.
>
> To illustrate my problem: If I use
> Net::HTTP.get_response(URI.parse("http://foo.bar/baz...)).body, then
> it serves up the index asking for a login. How do I get contents of
> baz.php?

I suspect that the user agent (i.e., the code, as opposed to a browser)
needs to include site cookies in the request headers.

After you sign in using a browser, you'll need to find the cookie left
by the site, or inspect a session cookie if the browser is not writing
it to disk. Most browsers have a way to show cookies sent by a site.



James


--

http://www.ru... - Ruby Help & Documentation
http://www.artima.c... - Ruby Code & Style: Writers wanted
http://www.rub... - The Ruby Store for Ruby Stuff
http://www.jame... - Playing with Better Toys
http://www.30seco... - Building Better Tools


Jim

12/26/2005 4:39:00 AM

0

Don't if this'll help, but www-mechanize is able to login to a site
that is using a html form to login.

You don't need to use a browser at all.

http://www.ntecs.de/blog/Blog/WWW-Mech...

Chad Layton

12/26/2005 9:21:00 AM

0

James Britt wrote:
> Chad Layton wrote:
>
>> I'm rather new to both web programming and ruby so forgive me if my
>> question is ill formed.
>>
>> I'm trying to do some screen scraping on a website that requires a
>> login . What I would like to have happen is for the user to login to
>> the website normally, then run my script which uses the existing login
>> session to grab the page and do whatever to it.
>>
>> To illustrate my problem: If I use
>> Net::HTTP.get_response(URI.parse("http://foo.bar/baz...)).body, then
>> it serves up the index asking for a login. How do I get contents of
>> baz.php?
>
>
> I suspect that the user agent (i.e., the code, as opposed to a browser)
> needs to include site cookies in the request headers.
>
> After you sign in using a browser, you'll need to find the cookie left
> by the site, or inspect a session cookie if the browser is not writing
> it to disk. Most browsers have a way to show cookies sent by a site.
>
>
>
> James
>
>

Thank you, James. I see that when I login to the site 4 cookies are set,
how would I include them in the request headers?

james_b

12/28/2005 11:28:00 PM

0

Chad Layton wrote:
> James Britt wrote:
>
>> Chad Layton wrote:
>>
>>> I'm rather new to both web programming and ruby so forgive me if my
>>> question is ill formed.
>>>
>>> I'm trying to do some screen scraping on a website that requires a
>>> login . What I would like to have happen is for the user to login to
>>> the website normally, then run my script which uses the existing
>>> login session to grab the page and do whatever to it.
>>>
>>> To illustrate my problem: If I use
>>> Net::HTTP.get_response(URI.parse("http://foo.bar/baz...)).body,
>>> then it serves up the index asking for a login. How do I get
>>> contents of baz.php?
>>
>>
>>
>> I suspect that the user agent (i.e., the code, as opposed to a
>> browser) needs to include site cookies in the request headers.
>>
>> After you sign in using a browser, you'll need to find the cookie left
>> by the site, or inspect a session cookie if the browser is not writing
>> it to disk. Most browsers have a way to show cookies sent by a site.
>>
>>
>>
>> James
>>
>>
>
> Thank you, James. I see that when I login to the site 4 cookies are set,
> how would I include them in the request headers?

I *think* you pass a hash into the Net::HTTP initializer, or perhaps as
a parameter to 'get' but I can't find docs or examples to prove this.



James
>
>


--

http://www.ru... - Ruby Help & Documentation
http://www.artima.c... - Ruby Code & Style: Writers wanted
http://www.rub... - The Ruby Store for Ruby Stuff
http://www.jame... - Playing with Better Toys
http://www.30seco... - Building Better Tools