[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

open-uri fetches outdated content vs. curl

Daniel Choi

9/18/2008 12:05:00 AM

Try running the following program:

================
require 'open-uri'

feed_url = "http://www.slate.com...

result1 = open(feed_url).read
puts "Saving result1.xml:"
File.open("result1.xml", "w") {|f| f.write(result1)}

result2 = `curl -L #{feed_url}`
puts "Saving result2.xml:"
File.open("result2.xml", "w") {|f| f.write(result2)}

command = "diff result1.xml result2.xml"
puts system(command)
================

result1 should be identical to result2, but it turns out that the feed
that open-uri fetches is outdated content (by over a month), while the
feed that curl fetches is up-to-date. Can anyone please explain what
is going on?

Thanks!
4 Answers

Robert Klemme

9/18/2008 6:27:00 AM

0

On 18.09.2008 02:05, Daniel Choi wrote:
> Try running the following program:
>
> ================
> require 'open-uri'
>
> feed_url = "http://www.slate.com...
>
> result1 = open(feed_url).read
> puts "Saving result1.xml:"
> File.open("result1.xml", "w") {|f| f.write(result1)}
>
> result2 = `curl -L #{feed_url}`
> puts "Saving result2.xml:"
> File.open("result2.xml", "w") {|f| f.write(result2)}
>
> command = "diff result1.xml result2.xml"
> puts system(command)
> ================
>
> result1 should be identical to result2, but it turns out that the feed
> that open-uri fetches is outdated content (by over a month), while the
> feed that curl fetches is up-to-date. Can anyone please explain what
> is going on?

Reasons I can think of:

i) Both approaches use different paths to the server, namely a different
(or no) proxy.

ii) There is something in the request that makes the server send
different data.

Can you try to obtain HTTP headers from both approaches? That might
clear up a few things. Also, on Unix type systems check for environment
variables and ~/.xyzrc files which might affect proxy settings.

Another good idea might be to try a different tool, e.g. a web browser,
to see what that turns up.

Kind regards

robert


Daniel Choi

9/18/2008 1:19:00 PM

0

On Sep 18, 2:26 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> On 18.09.2008 02:05, Daniel Choi wrote:
>
>
>
> > Try running the following program:
>
> > ================
> > require 'open-uri'
>
> > feed_url = "http://www.slate.com...
>
> > result1 = open(feed_url).read
> > puts "Saving result1.xml:"
> > File.open("result1.xml", "w") {|f| f.write(result1)}
>
> > result2 = `curl -L #{feed_url}`
> > puts "Saving result2.xml:"
> > File.open("result2.xml", "w") {|f| f.write(result2)}
>
> > command = "diff result1.xml result2.xml"
> > puts system(command)
> > ================
>
> > result1 should be identical to result2, but it turns out that the feed
> > thatopen-urifetches is outdated content (by over a month), while the
> > feed that curl fetches is up-to-date. Can anyone please explain what
> > is going on?
>
> Reasons I can think of:
>
> i) Both approaches use different paths to the server, namely a different
> (or no) proxy.
>
> ii) There is something in the request that makes the server send
> different data.
>
> Can you try to obtain HTTP headers from both approaches? That might
> clear up a few things. Also, on Unix type systems check for environment
> variables and ~/.xyzrc files which might affect proxy settings.
>
> Another good idea might be to try a different tool, e.g. a web browser,
> to see what that turns up.
>
> Kind regards
>
> robert


Thanks for these suggestions. The problem actually just cleared itself
up, after several days where the open-uri fetch was getting outdated
content. I think it was a problem is upstream proxies. I'll try to
look at the headers out of curiosity.

Daniel Choi

9/24/2008 12:12:00 AM

0


I used net/http to do the same thing, but this time I printed out the
redirect locations. The result is very interesting. If it don't set
the "User-Agent" header, it get redirected to one proxy -- the one
with outdated content. If I set the "User-Agent" header to "Mozilla/
5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/XX (KHTML, like
Gecko) Safari/YY" (faking Apple Safari), I get redirected to another
proxy, with the up to date content.

I didn't know that servers redirected requests to bad or good proxies
depending on what the User Agent header is. But this seems to be the
case here.

Robert Klemme

9/24/2008 6:21:00 AM

0

On 24.09.2008 02:11, Daniel Choi wrote:
> I used net/http to do the same thing, but this time I printed out the
> redirect locations. The result is very interesting. If it don't set
> the "User-Agent" header, it get redirected to one proxy -- the one
> with outdated content. If I set the "User-Agent" header to "Mozilla/
> 5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/XX (KHTML, like
> Gecko) Safari/YY" (faking Apple Safari), I get redirected to another
> proxy, with the up to date content.
>
> I didn't know that servers redirected requests to bad or good proxies
> depending on what the User Agent header is. But this seems to be the
> case here.

Daniel, thanks for the update! This is interesting stuff. The
distinction is probably not so much between "bad" or "good" proxies but
between proxies tailored for a particular browser version. Maybe it's a
bug and you should show this to your IT department. Could be that they
changed firewall rules in the past and the "bad" proxy never gets
updated because of lacking connectivity. :-)

Cheers

robert