Asp Forum - Question: Downloading files with open(-uri)?

Mariano Kamp

12/23/2006 12:36:00 PM

Hi,

I could need a quick hand here.

I want to watch the RailsConf 2006 videos and want to download
them with a script.

Unfortunately open("http:/xx") never comes back? Any idea what I
am doing wrong here?

I tested it with an URL that returns plain html and that worked
fine. See the first line, ibm.com.

require 'open-uri'

urls = %w{
http...
http://downloads.scribemedia.net/rails2006/03_martin_fowle...
http://downloads.scribemedia.net/rails2006/02_dave_thoma...
http://downloads.scribemedia.net/rails2006/01_dh_h...
http://downloads.scribemedia.net/rails2006/04_paul_graha...
http://downloads.scribemedia.net/rails2006/06_railsCorePane...
http://downloads.scribemedia.net/rails2006/07_why_lucky...
}
BUFFER_SIZE = 1_024*1_024*1

urls.each do |url|
puts "downloading #{url}"
open(url) do |input|
puts "opened connection."
output = open(url.split(/\//).last, "wb")
while (buffer = input.read(BUFFER_SIZE))
print "."
$stdout.flush
output.write(buffer)
end
output.close
end
puts "done."
end
puts "All downloads done."

Cheers,
Mariano

11 Answers

William James

12/23/2006 2:18:00 PM

Mariano Kamp wrote:
> Hi,
>
> I could need a quick hand here.
>
> I want to watch the RailsConf 2006 videos and want to download
> them with a script.
>
> Unfortunately open("http:/xx") never comes back? Any idea what I
> am doing wrong here?
>
> I tested it with an URL that returns plain html and that worked
> fine. See the first line, ibm.com.
>
> require 'open-uri'
>
> urls = %w{
> http...
> http://downloads.scribemedia.net/rails2006/03_martin_fowle...
> http://downloads.scribemedia.net/rails2006/02_dave_thoma...
> http://downloads.scribemedia.net/rails2006/01_dh_h...
> http://downloads.scribemedia.net/rails2006/04_paul_graha...
> http://downloads.scribemedia.net/rails2006/06_railsCorePane...
> http://downloads.scribemedia.net/rails2006/07_why_lucky...
> }
> BUFFER_SIZE = 1_024*1_024*1
>
> urls.each do |url|
> puts "downloading #{url}"
> open(url) do |input|
> puts "opened connection."
> output = open(url.split(/\//).last, "wb")
> while (buffer = input.read(BUFFER_SIZE))
> print "."
> $stdout.flush
> output.write(buffer)
> end
> output.close
> end
> puts "done."
> end
> puts "All downloads done."
>
> Cheers,
> Mariano

There's nothing wrong with your program; I tested it by
downloading a picture. If you have a dial-up connection, maybe
the transfer is progressing very slowly.

Mariano Kamp

12/23/2006 2:41:00 PM

On Dec 23, 2006, at 3:20 PM, William James wrote:

> Mariano Kamp wrote:
>> Hi,
>>
>> I could need a quick hand here.
>>
>> I want to watch the RailsConf 2006 videos and want to download
>> them with a script.
>>
>> Unfortunately open("http:/xx") never comes back? Any idea what I
>> am doing wrong here?
>>
>> I tested it with an URL that returns plain html and that worked
>> fine. See the first line, ibm.com.
>>
>> require 'open-uri'
>>
>> urls = %w{
>> http...
>> http://downloads.scribemedia.net/rails2006/03_martin_fowle...
>> http://downloads.scribemedia.net/rails2006/02_dave_thoma...
>> http://downloads.scribemedia.net/rails2006/01_dh_h...
>> http://downloads.scribemedia.net/rails2006/04_paul_graha...
>> http://downloads.scribemedia.net/rails2006/06_railsCorePane...
>> http://downloads.scribemedia.net/rails2006/07_why_lucky...
>> }
>> BUFFER_SIZE = 1_024*1_024*1
>>
>> urls.each do |url|
>> puts "downloading #{url}"
>> open(url) do |input|
>> puts "opened connection."
>> output = open(url.split(/\//).last, "wb")
>> while (buffer = input.read(BUFFER_SIZE))
>> print "."
>> $stdout.flush
>> output.write(buffer)
>> end
>> output.close
>> end
>> puts "done."
>> end
>> puts "All downloads done."
>
> There's nothing wrong with your program; I tested it by
> downloading a picture. If you have a dial-up connection, maybe
> the transfer is progressing very slowly.

Hey Bill,

hmm, not sure. If I change the BUFFER_SIZE to 1KB I still don't
see anything and the "puts 'opened connection'" should at least be
visible, shouldn't it?

Anyways I have a 6 MBit/s downstream so even a 1MB buffer
shouldn't be a problem.

I also suspected that the server is checking for deep links and
would evaluate the referer in the process, but when I enter one of
the urls directly into my browser it works.

Very strange.

Cheers,
Mariano

Edwin Fine

12/23/2006 3:23:00 PM

William James wrote:
> Mariano Kamp wrote:
>> I tested it with an URL that returns plain html and that worked
>> http://downloads.scribemedia.net/rails2006/06_railsCorePane...
>> print "."
>> Mariano
> There's nothing wrong with your program; I tested it by
> downloading a picture. If you have a dial-up connection, maybe
> the transfer is progressing very slowly.

Actually, I think the site is slow or overloaded. The movies are 250MB -
500MB in size, and the download speed I am getting is around 52
KBytes/second (and I have a broadband connection). This code works
better at showing progress:

require 'open-uri'

urls = %w{
http...
http://downloads.scribemedia.net/rails2006/03_martin_fowle...
http://downloads.scribemedia.net/rails2006/02_dave_thoma...
http://downloads.scribemedia.net/rails2006/01_dh_h...
http://downloads.scribemedia.net/rails2006/04_paul_graha...
http://downloads.scribemedia.net/rails2006/06_railsCorePane...
http://downloads.scribemedia.net/rails2006/07_why_lucky...
}

BUFFER_SIZE = 8 * 1_024

urls.each do |url|
puts "downloading #{url}"
out_file = url.split(/\//).last
puts "Writing to #{out_file}"

open(url, "r",
:content_length_proc => lambda {|content_length| puts "Content
length: #{content_length} bytes" },
:progress_proc => lambda { |size| printf("Read %010d bytes\r",
size.to_i) }) do |input|
open(out_file, "wb") do |output|
while (buffer = input.read(BUFFER_SIZE))
output.write(buffer)
end
end
end
puts "\ndone."
end
puts "All downloads done."

--
Posted via http://www.ruby-....

Robert Klemme

12/23/2006 3:30:00 PM

On 23.12.2006 15:40, Mariano Kamp wrote:
>
> On Dec 23, 2006, at 3:20 PM, William James wrote:
>
>> Mariano Kamp wrote:
>>> Hi,
>>>
>>> I could need a quick hand here.
>>>
>>> I want to watch the RailsConf 2006 videos and want to download
>>> them with a script.
>>>
>>> Unfortunately open("http:/xx") never comes back? Any idea what I
>>> am doing wrong here?
>>>
>>> I tested it with an URL that returns plain html and that worked
>>> fine. See the first line, ibm.com.
>>>
>>> require 'open-uri'
>>>
>>> urls = %w{
>>> http...
>>> http://downloads.scribemedia.net/rails2006/03_martin_fowle...
>>> http://downloads.scribemedia.net/rails2006/02_dave_thoma...
>>> http://downloads.scribemedia.net/rails2006/01_dh_h...
>>> http://downloads.scribemedia.net/rails2006/04_paul_graha...
>>> http://downloads.scribemedia.net/rails2006/06_railsCorePane...
>>> http://downloads.scribemedia.net/rails2006/07_why_lucky...
>>> }
>>> BUFFER_SIZE = 1_024*1_024*1
>>>
>>> urls.each do |url|
>>> puts "downloading #{url}"
>>> open(url) do |input|
>>> puts "opened connection."
>>> output = open(url.split(/\//).last, "wb")
>>> while (buffer = input.read(BUFFER_SIZE))
>>> print "."
>>> $stdout.flush
>>> output.write(buffer)
>>> end
>>> output.close
>>> end
>>> puts "done."
>>> end
>>> puts "All downloads done."
>>
>> There's nothing wrong with your program; I tested it by
>> downloading a picture. If you have a dial-up connection, maybe
>> the transfer is progressing very slowly.
>
> Hey Bill,
>
> hmm, not sure. If I change the BUFFER_SIZE to 1KB I still don't see
> anything and the "puts 'opened connection'" should at least be visible,
> shouldn't it?
>
> Anyways I have a 6 MBit/s downstream so even a 1MB buffer shouldn't be
> a problem.
>
> I also suspected that the server is checking for deep links and would
> evaluate the referer in the process, but when I enter one of the urls
> directly into my browser it works.
>
> Very strange.

I observe the same behavior that you see. I have no knowledge of
openuri internals but here's my theory: the page is probably loaded
completely before open returns. This would explain why you see the dots
from ibm.com in one go. I would test the same with net/http and see
whether there is any difference. Make sure to use the stream form.

Kind regards

robert

Ross Bamford

12/23/2006 3:46:00 PM

On Sat, 23 Dec 2006 12:35:40 -0000, Mariano Kamp <mariano.kamp@acm.org> =
=

wrote:

> Hi,
>
> I could need a quick hand here.
>
> I want to watch the RailsConf 2006 videos and want to download them=
=

> with a script.
>

If you have libcurl and are willing to install an extension, the =

rececently released (;)) Curb 0.1 makes this as easy as:

#!/usr/bin/env ruby
urls =3D %w{
http://downloads.scribemedia.net/rails2006/03_martin_fowle...
http://downloads.scribemedia.net/rails2006/02_dave_thoma...
http://downloads.scribemedia.net/rails2006/01_dh_h...
http://downloads.scribemedia.net/rails2006/04_paul_graha...
http://downloads.scribemedia.net/rails2006/06_railsCorePane...
http://downloads.scribemedia.net/rails2006/07_why_lucky...
}

urls.each { |url| Curl::Easy.download(url) }

__END__

It's at http://curb.ruby...

-- =

Ross Bamford - rosco@roscopeco.remove.co.uk

Mariano Kamp

12/23/2006 4:05:00 PM

On Dec 23, 2006, at 4:55 PM, Ross Bamford wrote:

>
> If you have libcurl and are willing to install an extension, the
> rececently released (;)) Curb 0.1 makes this as easy as:
Thanks for the tip Ross.

I tried gem install curb ;-) but that didn't work. And as the other
version is already downloading the files and I just wanted this
program to do this single job I will try out curb the next time ;-)

You've implemented it in C, so you probably can't answer my question
how you dealt with the buffer size too, can you?
Cheers,
Mariano

Robert Klemme

12/23/2006 4:07:00 PM

On 23.12.2006 16:29, Robert Klemme wrote:
> On 23.12.2006 15:40, Mariano Kamp wrote:
>>
>> On Dec 23, 2006, at 3:20 PM, William James wrote:
>>
>>> Mariano Kamp wrote:
>>>> Hi,
>>>>
>>>> I could need a quick hand here.
>>>>
>>>> I want to watch the RailsConf 2006 videos and want to download
>>>> them with a script.
>>>>
>>>> Unfortunately open("http:/xx") never comes back? Any idea what I
>>>> am doing wrong here?
>>>>
>>>> I tested it with an URL that returns plain html and that worked
>>>> fine. See the first line, ibm.com.
>>>>
>>>> require 'open-uri'
>>>>
>>>> urls = %w{
>>>> http...
>>>> http://downloads.scribemedia.net/rails2006/03_martin_fowle...
>>>> http://downloads.scribemedia.net/rails2006/02_dave_thoma...
>>>> http://downloads.scribemedia.net/rails2006/01_dh_h...
>>>> http://downloads.scribemedia.net/rails2006/04_paul_graha...
>>>> http://downloads.scribemedia.net/rails2006/06_railsCorePane...
>>>> http://downloads.scribemedia.net/rails2006/07_why_lucky...
>>>> }
>>>> BUFFER_SIZE = 1_024*1_024*1
>>>>
>>>> urls.each do |url|
>>>> puts "downloading #{url}"
>>>> open(url) do |input|
>>>> puts "opened connection."
>>>> output = open(url.split(/\//).last, "wb")
>>>> while (buffer = input.read(BUFFER_SIZE))
>>>> print "."
>>>> $stdout.flush
>>>> output.write(buffer)
>>>> end
>>>> output.close
>>>> end
>>>> puts "done."
>>>> end
>>>> puts "All downloads done."
>>>
>>> There's nothing wrong with your program; I tested it by
>>> downloading a picture. If you have a dial-up connection, maybe
>>> the transfer is progressing very slowly.
>>
>> Hey Bill,
>>
>> hmm, not sure. If I change the BUFFER_SIZE to 1KB I still don't see
>> anything and the "puts 'opened connection'" should at least be
>> visible, shouldn't it?
>>
>> Anyways I have a 6 MBit/s downstream so even a 1MB buffer shouldn't
>> be a problem.
>>
>> I also suspected that the server is checking for deep links and
>> would evaluate the referer in the process, but when I enter one of the
>> urls directly into my browser it works.
>>
>> Very strange.
>
> I observe the same behavior that you see. I have no knowledge of
> openuri internals but here's my theory: the page is probably loaded
> completely before open returns. This would explain why you see the dots
> from ibm.com in one go. I would test the same with net/http and see
> whether there is any difference. Make sure to use the stream form.

Try this (note, this will not follow redirects):

robert

require 'net/http'
require 'uri'

urls = %w{
http...
http://downloads.scribemedia.net/rails2006/03_martin_fowle...
http://downloads.scribemedia.net/rails2006/02_dave_thoma...
}

$stdout.sync=true

urls.each do |url|
puts "downloading #{url}"

Net::HTTP.get_response(URI.parse(url)) do |res|
puts "opened connection."
target = url.split(/\//).last
puts "writing to #{target}"

File.open(target, "wb") do |output|
# next line will read in chunks but not provide option for dots...
# res.read_body(output)
res.read_body do |chunk|
output.write(chunk)
print "."
end
end
end

puts "done."
end

puts "All downloads done."

Edwin Fine

12/23/2006 4:36:00 PM

Mariano Kamp wrote:
> Edwin Fine wrote:
>>
>> http://downloads.scribe...rails2006/03_martin_fowle...
>> puts "downloading #{url}"
>> output.write(buffer)
>> end
>> end
>> end
>> puts "\ndone."
>> end
>> puts "All downloads done."
>
> Wow. Cool. How did you know about the content_length and progress
> hooks? I don't see them in the docs.
>
> Anyway ... That looks nice, but I still don't see the progress on the
> console, other than for ibm.com. Do you?
>
> I can see that I am downloading at 50KBytes/s using a network traffic
> monitor, but not on the console. And if I read this right it should
> yield a progress update roughly every kilobyte , right?
>
> This is what I see after ... say ... 5 minutes after launching the
> program.
>
> downloading http...
> Writing to ibm.com
> Content
> length: 25348 bytes
> Read 0000000822 bytes Read 0000001158 bytes Read 0000002182 bytes
> Read 0000002518 bytes Read 0000003542 bytes Read 0000003878 bytes
> Read 0000004902 bytes Read 0000005238 bytes Read 0000006262 bytes
> Read 0000006598 bytes Read 0000007622 bytes Read 0000007958 bytes
> Read 0000008982 bytes Read 0000009318 bytes Read 0000010342 bytes
> Read 0000011366 bytes Read 0000012390 bytes Read 0000013398 bytes
> Read 0000014422 bytes Read 0000014758 bytes Read 0000015782 bytes
> Read 0000016118 bytes Read 0000017142 bytes Read 0000017478 bytes
> Read 0000018502 bytes Read 0000018838 bytes Read 0000019862 bytes
> Read 0000020198 bytes Read 0000021222 bytes Read 0000021558 bytes
> Read 0000022582 bytes Read 0000022918 bytes Read 0000023942 bytes
> Read 0000024278 bytes Read 0000025302 bytes Read 0000025348 bytes
> done.
> downloading http://downloads.scribe...
> rails2006/03_martin_fowler_full.m4v
> Writing to 03_martin_fowler_full.m4v
> Content
> length: 413031533 bytes
>
>
> Cheers,
> Mariano

It's documented here:
http://www.ruby-doc.org/stdlib/libdoc/open...

This is what I am seeing:
downloading http...
Writing to ibm.com
Content length: 25348 bytes
Read 0000025348 bytes
done.
downloading
http://downloads.scribe...rails2006/03_martin_fowle...
Writing to 03_martin_fowler_full.m4v
Content length: 413031533 bytes
Read 0131826472 bytes

It seems to update around every second, based on informal observation. I
don't know why your output looks different; did you redirect or tee it
to a file? I'm using an old 'C' trick of printing a CR (\r) after each
update, which should keep the output on the same line and just overwrite
what was there before.

I'm running this using Ruby 1.8.5 on Ubuntu Edgy x86_64. Perhaps your OS
is different and has some other behavior.

I tried everything I could think of to disable or bypass buffering,
including $stdout.sync = true, using $stderr, calling $stdout.flush,
using syswrite, and so on, to get the output to appear periodically,
without success. I think the output is buffered at the OS level, or
something like that, so that even calling flush won't always work. The
only thing that works for me is the progress hook.

--
Posted via http://www.ruby-....

Mariano Kamp

12/23/2006 5:05:00 PM

Edwin Fine wrote:
> Mariano Kamp wrote:
>> Edwin Fine wrote:
>>
>> Wow. Cool. How did you know about the content_length and progress
>> hooks? I don't see them in the docs.
> It's documented here:
> http://www.ruby-doc.org/stdlib/libdoc/open...
Grmpfh. I looked there, but probably too properly.

> downloading http...
[..]
> Read 0131826472 bytes
Thanks for trying that out.

Well, it seems, that open already read all the bytes. Changing the
implementation the way Robert suggested healed that.

So it was not really a problem with the buffering, as I suspected,
but with improper use of the API.

Cheers,
Mariano

Ross Bamford

12/23/2006 9:31:00 PM

On Sat, 23 Dec 2006 16:04:33 -0000, Mariano Kamp <mariano.kamp@acm.org>
wrote:

>
> On Dec 23, 2006, at 4:55 PM, Ross Bamford wrote:
>
>>
>> If you have libcurl and are willing to install an extension, the
>> rececently released (;)) Curb 0.1 makes this as easy as:
> Thanks for the tip Ross.
>

Sure :)

> I tried gem install curb ;-) but that didn't work. And as the other
> version is already downloading the files and I just wanted this program
> to do this single job I will try out curb the next time ;-)
>

I hear you on the rubygem thing. In preparation for next time, you might
try that gem install again - it should work now ;)

> You've implemented it in C, so you probably can't answer my question how
> you dealt with the buffer size too, can you?

I just left that to the experts - although libcurl does provide some
opportunity for fiddling with it's buffers, it generally seems to do
pretty well with it's defaults so none of that's exposed in Ruby yet.

Cheers,
--
Ross Bamford - rosco@roscopeco.remove.co.uk

comp.lang.ruby

Question: Downloading files with open(-uri)?

Mariano Kamp

William James

Mariano Kamp

Edwin Fine

Robert Klemme

Ross Bamford

Mariano Kamp

Robert Klemme

Edwin Fine

Mariano Kamp

Ross Bamford

x Login to ForumsZone