[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Trying to GET google with socket....problem

Hey You

4/7/2007 8:28:00 PM

Well I don't know why the socket can't connect to Google. Here is my
source code:

require 'socket'
h = TCPSocket.new('www.google.ca',80)
h.print "GET /index.html HTTP/1.0\n\n"
a = h.read
puts a

I tried changing the HTTP to 1.1 but it still doesn't work.

--
Posted via http://www.ruby-....

16 Answers

Michael Gorsuch

4/7/2007 8:39:00 PM

0

I just ran this code in irb, and it worked without issue.

Can you provide the specific exception or unexpected results?

On Sun, Apr 08, 2007 at 05:28:07AM +0900, Hey You wrote:
> Well I don't know why the socket can't connect to Google. Here is my
> source code:
>
> require 'socket'
> h = TCPSocket.new('www.google.ca',80)
> h.print "GET /index.html HTTP/1.0\n\n"
> a = h.read
> puts a
>
> I tried changing the HTTP to 1.1 but it still doesn't work.
>
> --
> Posted via http://www.ruby-....
>

Michael Gorsuch

4/7/2007 8:41:00 PM

0

Also, can you provide the platform that you are using? I was using OS X.

On Sun, Apr 08, 2007 at 05:28:07AM +0900, Hey You wrote:
> Well I don't know why the socket can't connect to Google. Here is my
> source code:
>
> require 'socket'
> h = TCPSocket.new('www.google.ca',80)
> h.print "GET /index.html HTTP/1.0\n\n"
> a = h.read
> puts a
>
> I tried changing the HTTP to 1.1 but it still doesn't work.
>
> --
> Posted via http://www.ruby-....
>

Ryan Davis

4/7/2007 8:42:00 PM

0


On Apr 7, 2007, at 13:28 , Hey You wrote:

> Well I don't know why the socket can't connect to Google. Here is my
> source code:
>
> require 'socket'
> h = TCPSocket.new('www.google.ca',80)
> h.print "GET /index.html HTTP/1.0\n\n"
> a = h.read
> puts a

If you just want to get google (or whatever), use:

ruby -ropen-uri -e 'puts URI.parse("http://www.g...
index.html").read'

If you want to know the inner-workings of HTTP clients and servers,
use the above and trace it backwards. There is a lot of good code in
there.


Hey You

4/7/2007 8:53:00 PM

0

Michael Gorsuch wrote:
> I just ran this code in irb, and it worked without issue.
>
> Can you provide the specific exception or unexpected results?
Well I just ran the code and got this:

HTTP/1.0 302 Found

Location: http://www.google.ca/...

Cache-Control: private

Set-Cookie:
PREF=ID=e20f9edec5958042:TM=1175979001:LM=1175979001:S=shwmC1m6Amdg20nV;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com

Content-Type: text/html

Server: GWS/2.1

Content-Length: 228

Date: Sat, 07 Apr 2007 20:50:01 GMT

Connection: Keep-Alive



<HTML><HEAD><meta http-equiv="content-type"
content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.ca/...">here</A>.

</BODY></HTML>

Also I would like to stick to using sockets instead of other HTTP
clients :).

--
Posted via http://www.ruby-....

Hey You

4/7/2007 9:04:00 PM

0

Michael Gorsuch wrote:
> Also, can you provide the platform that you are using? I was using OS
> X.
Well I don't know what you meant right there but I'm using Windows XP.

--
Posted via http://www.ruby-....

Michael Gorsuch

4/7/2007 9:29:00 PM

0

OK, so you are getting a response back from the server.

I have no idea why you're getting a redirect from them, but you are getting a proper response over your socket.



On Sun, Apr 08, 2007 at 05:53:26AM +0900, Hey You wrote:
> Michael Gorsuch wrote:
> > I just ran this code in irb, and it worked without issue.
> >
> > Can you provide the specific exception or unexpected results?
> Well I just ran the code and got this:
>
> HTTP/1.0 302 Found
>
> Location: http://www.google.ca/...
>
> Cache-Control: private
>
> Set-Cookie:
> PREF=ID=e20f9edec5958042:TM=1175979001:LM=1175979001:S=shwmC1m6Amdg20nV;
> expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
>
> Content-Type: text/html
>
> Server: GWS/2.1
>
> Content-Length: 228
>
> Date: Sat, 07 Apr 2007 20:50:01 GMT
>
> Connection: Keep-Alive
>
>
>
> <HTML><HEAD><meta http-equiv="content-type"
> content="text/html;charset=utf-8">
> <TITLE>302 Moved</TITLE></HEAD><BODY>
> <H1>302 Moved</H1>
> The document has moved
> <A HREF="http://www.google.ca/...">here</A>.
>
> </BODY></HTML>
>
> Also I would like to stick to using sockets instead of other HTTP
> clients :).
>
> --
> Posted via http://www.ruby-....
>

Hey You

4/7/2007 9:52:00 PM

0

Michael Gorsuch wrote:
> OK, so you are getting a response back from the server.
>
> I have no idea why you're getting a redirect from them, but you are
> getting a proper response over your socket.
Well thank you for the answer :). The thing is that it's weird that even
when I put the host as google.ca it still redirects me to google.ca.
Well thank you to everyone that has helped me and I appreciate it but I
am wondering something else now: Why when I put HTTP/1.1 the program
loads but it just stays blank, not doing anything.

--
Posted via http://www.ruby-....

Philipp Taprogge

4/8/2007 12:09:00 AM

0

Hi!

The answers to both of your questions is simple... :)

Thus spake Hey You on 04/07/2007 11:51 PM:
> Well thank you for the answer :). The thing is that it's weird that even
> when I put the host as google.ca it still redirects me to google.ca.

That's because google redirects you to your localized version of
google and you did not specify the hostname in your get. You open a
socket to www.google.ca, but you only tell it to deliver some
"index.html". If that machine hosted multiple domains (which in fact
it does), it would not know whether to send you
www.google.ca/index.html or perhaps www.google.de/index.html.
So it informs you that it has an "/index.html" for you which it
figures might best suit your needs and that this page can be found
by issuing the following HTTP command:

GET www.google.ca/index.html HTTP/1.0\n\n

> Well thank you to everyone that has helped me and I appreciate it but I
> am wondering something else now: Why when I put HTTP/1.1 the program
> loads but it just stays blank, not doing anything.

The answer to that question is even simpler:
In HTTP/1.0, you open a socket, issue a request, get a response and
close the socket again for each and every single item you need. You
open a socket for the html-page itself, another one to request an
image specified in that page and so on. So after each request, the
socket is closed by the server.

When you specify HTTP/1.1, you have another option: pipelining. When
you request a resource via HTTP/1.1, a compliant server MAY keep the
socket open for you after it's response so that you might specify
another request without having to open a whole new socket. If the
server does this, it is the client's responsibility to close the
socket when it does not require any more data.
Try it: open up a telnet connection to www.google.ca and issue your
request as HTTP/1.0. The socket will close immediately after the
response from the server.
Now do the same thing again but specify HTTP/1.1. This time the
socket stays open and your can issue another request (or the same
request again to keep things simple.

For further information I suggest you read rfc1945 and rfc2616
respectively.

HTH, HAND,

Phil

Hey You

4/8/2007 3:54:00 AM

0

Philipp Taprogge wrote:
> Hi!
>
> The answers to both of your questions is simple... :)
>
> Thus spake Hey You on 04/07/2007 11:51 PM:
>> Well thank you for the answer :). The thing is that it's weird that even
>> when I put the host as google.ca it still redirects me to google.ca.
>
> That's because google redirects you to your localized version of
> google and you did not specify the hostname in your get. You open a
> socket to www.google.ca, but you only tell it to deliver some
> "index.html". If that machine hosted multiple domains (which in fact
> it does), it would not know whether to send you
> www.google.ca/index.html or perhaps www.google.de/index.html.
> So it informs you that it has an "/index.html" for you which it
> figures might best suit your needs and that this page can be found
> by issuing the following HTTP command:
>
> GET www.google.ca/index.html HTTP/1.0\n\n
>
>> Well thank you to everyone that has helped me and I appreciate it but I
>> am wondering something else now: Why when I put HTTP/1.1 the program
>> loads but it just stays blank, not doing anything.
>
> The answer to that question is even simpler:
> In HTTP/1.0, you open a socket, issue a request, get a response and
> close the socket again for each and every single item you need. You
> open a socket for the html-page itself, another one to request an
> image specified in that page and so on. So after each request, the
> socket is closed by the server.
>
> When you specify HTTP/1.1, you have another option: pipelining. When
> you request a resource via HTTP/1.1, a compliant server MAY keep the
> socket open for you after it's response so that you might specify
> another request without having to open a whole new socket. If the
> server does this, it is the client's responsibility to close the
> socket when it does not require any more data.
> Try it: open up a telnet connection to www.google.ca and issue your
> request as HTTP/1.0. The socket will close immediately after the
> response from the server.
> Now do the same thing again but specify HTTP/1.1. This time the
> socket stays open and your can issue another request (or the same
> request again to keep things simple.
>
> For further information I suggest you read rfc1945 and rfc2616
> respectively.
>
> HTH, HAND,
>
> Phil
Thank you a lot Phil! I have learned a lot from you like how to POST
data (Yup, I learned) and much more and I am very grateful for all the
help you have given me. It makes sense why it didn't connect to
google.ca and I learned how to fix it right after my last post but I had
to go offline. I have also read RFC2616 but only bits and pieces of what
I have read are stuck in my head so I will keep re-reading it to learn
more. I will also read RFC1945 and I'm sorry for my newbish posts. It's
not that I'm lazy because I really am a hard worker but it's just that I
needed someone to point me to the right direction and that is what you
did :).

--
Posted via http://www.ruby-....

Brian Candler

4/8/2007 2:43:00 PM

0

On Sun, Apr 08, 2007 at 05:28:07AM +0900, Hey You wrote:
> Well I don't know why the socket can't connect to Google. Here is my
> source code:
>
> require 'socket'
> h = TCPSocket.new('www.google.ca',80)
> h.print "GET /index.html HTTP/1.0\n\n"
> a = h.read
> puts a
>
> I tried changing the HTTP to 1.1 but it still doesn't work.

Two problems:
(1) Line terminator for HTTP is \r\n not \n
(2) You have not supplied a Host: header

h.print "GET /index.html HTTP/1.0\r\nHost: www.google.ca\r\n\r\n"

I say again: you must read and understand RFC 2616.

This documents HTTP/1.1, which has gained a lot of features. You could try
reading the earlier RFCs for HTTP/1.0 or HTTP/0.9 for a simplified protocol.

B.