[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

[python] How to detect a remote webpage is accessible? (in HTTP

??

1/18/2008 5:23:00 AM

Howdy, all,
I want to use python to detect the accessibility of website.
Currently, I use urllib
to obtain the remote webpage, and see whether it fails. But the problem is that
the webpage may be very large; it takes too long time. Certainly, it
is no need to download
the entire page. Could you give me a good and fast solution?
Thank you.
--
ShenLei
3 Answers

Jarek Zgoda

1/18/2008 10:04:00 AM

0

ç??ç?? napisaÅ?(a):
> Howdy, all,
> I want to use python to detect the accessibility of website.
> Currently, I use urllib
> to obtain the remote webpage, and see whether it fails. But the problem is that
> the webpage may be very large; it takes too long time. Certainly, it
> is no need to download
> the entire page. Could you give me a good and fast solution?
> Thank you.

Issue HTTP HEAD request.

--
Jarek Zgoda
Skype: jzgoda | GTalk: zgoda@jabber.aster.pl | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)

G F

1/18/2008 12:33:00 PM

0

On Jan 18, 6:22 am, "??" <littlesweetme...@gmail.com> wrote:
> Howdy, all,
> I want to use python to detect the accessibility of website.
> Currently, I use urllib
> to obtain the remote webpage, and see whether it fails. But the problem is that
> the webpage may be very large; it takes too long time. Certainly, it
> is no need to download
> the entire page. Could you give me a good and fast solution?
> Thank you.
> --
> ShenLei

http://groups.google.com/group/comp.lang.python/browse_frm/thread/bbac82df3d64d48e/da75e4...

John Nagle

1/18/2008 6:03:00 PM

0

?? wrote:
> Howdy, all,
> I want to use python to detect the accessibility of website.
> Currently, I use urllib
> to obtain the remote webpage, and see whether it fails. But the problem is that
> the webpage may be very large; it takes too long time. Certainly, it
> is no need to download
> the entire page. Could you give me a good and fast solution?
> Thank you.
> --
> ShenLei

If you can get through "urlopen", you've already received the HTTP headers.
Just open, then use "info()" on the file descriptor to get the header info.
Don't read the content at all.

Setting the socket timeout will shorten the timeout when the requested
domain won't respond at all. But if the remote host opens an HTTP connection,
then sends nothing, the socket timeout is ineffective and you wait for a while.
This is rare, but it happens.

John Nagle