[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Best way to download >1GB files

fedzor

12/31/2007 10:04:00 PM

What is the best way to download files from the internet (HTTP) that
are greater than 1GB?

Here's the story in whole....
I was trying to use Ruby Net::HTTP to manage a download from
wikipedia... Specifically all current versions of the english one...
But anyways, as I was downloading it, I got a memory error as I ran
out of RAM.

My current code:
open(@opts[:out], "w") do |f|
http = Net::HTTP.new(@url.host, @url.port)
c = http.start do |http|
a = Net::HTTP::Get.new(@url.page)
http.request(a)
end
f.write(c.body)
end

I was hoping there'd be some method that I can attach a block to, so
that for each byte it will call the block.

Is there some way to write the bytes to the file as they come in, not
at the end?

Thanks,
---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est
man alive




23 Answers

Tim Hunter

12/31/2007 10:15:00 PM

0

thefed wrote:
> What is the best way to download files from the internet (HTTP) that are
> greater than 1GB?
>
> Here's the story in whole....
> I was trying to use Ruby Net::HTTP to manage a download from
> wikipedia... Specifically all current versions of the english one... But
> anyways, as I was downloading it, I got a memory error as I ran out of RAM.
>
> My current code:
> open(@opts[:out], "w") do |f|
> http = Net::HTTP.new(@url.host, @url.port)
> c = http.start do |http|
> a = Net::HTTP::Get.new(@url.page)
> http.request(a)
> end
> f.write(c.body)
> end
>
> I was hoping there'd be some method that I can attach a block to, so
> that for each byte it will call the block.
>
> Is there some way to write the bytes to the file as they come in, not at
> the end?
>

Not precisely what you asked for, but this is how ara t. howard told me
to download large files, using open-uri. This gets one 8kb sized chunk
at a time:

open(uri) do |fin|
open(File.basename(uri), "w") do |fout|
while (buf = fin.read(8192))
fout.write buf
end
end
end




--
RMagick: http://rmagick.ruby...
RMagick 2: http://rmagick.ruby...rmagick2.html

fedzor

12/31/2007 11:57:00 PM

0


On Dec 31, 2007, at 5:15 PM, Tim Hunter wrote:

> Not precisely what you asked for, but this is how ara t. howard
> told me to download large files, using open-uri. This gets one 8kb
> sized chunk at a time:
>
> open(uri) do |fin|
> open(File.basename(uri), "w") do |fout|
> while (buf = fin.read(8192))
> fout.write buf
> end
> end
> end

But doesn't open-uri download the whole thing to your compy? I was
about to use it, but then I ran it in irb and saw it returned a file
object.

-------------------------------------------------------|
~ Ari
seydar: it's like a crazy love triangle of Kernel commands and C code



Tim Hunter

1/1/2008 12:20:00 AM

0

thefed wrote:
> But doesn't open-uri download the whole thing to your compy? I was about
> to use it, but then I ran it in irb and saw it returned a file object.

Isn't that what you want to happen? I thought your question was about
how to download it in small chunks so it's not all in memory at the same
time. This code downloads the whole file, but 8kb at a time.

--
RMagick: http://rmagick.ruby...
RMagick 2: http://rmagick.ruby...rmagick2.html

Bryan Duxbury

1/1/2008 12:23:00 AM

0

Is there some reason to not use wget or curl? Those are both written
already. What are you hoping to do with the files you download?

-Bryan

On Dec 31, 2007, at 2:04 PM, thefed wrote:

> What is the best way to download files from the internet (HTTP)
> that are greater than 1GB?
>
> Here's the story in whole....
> I was trying to use Ruby Net::HTTP to manage a download from
> wikipedia... Specifically all current versions of the english
> one... But anyways, as I was downloading it, I got a memory error
> as I ran out of RAM.
>
> My current code:
> open(@opts[:out], "w") do |f|
> http = Net::HTTP.new(@url.host, @url.port)
> c = http.start do |http|
> a = Net::HTTP::Get.new(@url.page)
> http.request(a)
> end
> f.write(c.body)
> end
>
> I was hoping there'd be some method that I can attach a block to,
> so that for each byte it will call the block.
>
> Is there some way to write the bytes to the file as they come in,
> not at the end?
>
> Thanks,
> ---------------------------------------------------------------|
> ~Ari
> "I don't suffer from insanity. I enjoy every minute of it"
> --1337est man alive
>
>
>
>


fedzor

1/1/2008 6:18:00 PM

0


On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:

> thefed wrote:
>> But doesn't open-uri download the whole thing to your compy? I was
>> about to use it, but then I ran it in irb and saw it returned a
>> file object.
>
> Isn't that what you want to happen? I thought your question was
> about how to download it in small chunks so it's not all in memory
> at the same time. This code downloads the whole file, but 8kb at a
> time.

No, I thought when you use Kernel#open with open-uri, it FIRST
downloads the entire 1GB file to your temp folder, and THEN runs your
block on that file in temp

fedzor

1/1/2008 6:19:00 PM

0


On Dec 31, 2007, at 7:23 PM, Bryan Duxbury wrote:

> Is there some reason to not use wget or curl? Those are both
> written already. What are you hoping to do with the files you
> download?

I'm trying to write wget/axel in ruby. Plus add torrent support!

Michal Suchanek

1/1/2008 6:39:00 PM

0

On 01/01/2008, thefed <fedzor@gmail.com> wrote:
>
> On Dec 31, 2007, at 7:23 PM, Bryan Duxbury wrote:
>
> > Is there some reason to not use wget or curl? Those are both
> > written already. What are you hoping to do with the files you
> > download?
>
> I'm trying to write wget/axel in ruby. Plus add torrent support!
>

Is there some particular reason not to use Aria2, it's already written ;-)

Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good :)

Thanks

Michal

Tim Hunter

1/1/2008 6:56:00 PM

0

thefed wrote:
>
> On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
>
>> thefed wrote:
>>> But doesn't open-uri download the whole thing to your compy? I was
>>> about to use it, but then I ran it in irb and saw it returned a file
>>> object.
>>
>> Isn't that what you want to happen? I thought your question was about
>> how to download it in small chunks so it's not all in memory at the
>> same time. This code downloads the whole file, but 8kb at a time.
>
> No, I thought when you use Kernel#open with open-uri, it FIRST downloads
> the entire 1GB file to your temp folder, and THEN runs your block on
> that file in temp
>

Interesting. I just tried downloading a 6.1MB file with open-uri and
didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.


--
RMagick: http://rmagick.ruby...
RMagick 2: http://rmagick.ruby...rmagick2.html

fedzor

1/1/2008 7:02:00 PM

0


On Jan 1, 2008, at 1:38 PM, Michal Suchanek wrote:

> Is there some particular reason not to use Aria2, it's already
> written ;-)
>
> Yes, the UI sucks, and it cannot download multifile torrents from the
> web as well but to compete with that you would have to make something
> really good :)

Well then I have a competitor!

I'm really writing this just for practice, but also because I think
the world needs a ruby downloader.

Maybe to give myself a fighting chance against aria2, I'll lower the
version numbers instead of raising them.

- Ari

fedzor

1/1/2008 7:10:00 PM

0


On Jan 1, 2008, at 1:56 PM, Tim Hunter wrote:

> thefed wrote:
>> On Dec 31, 2007, at 7:20 PM, Tim Hunter wrote:
>>> thefed wrote:
>>>> But doesn't open-uri download the whole thing to your compy? I
>>>> was about to use it, but then I ran it in irb and saw it
>>>> returned a file object.
>>>
>>> Isn't that what you want to happen? I thought your question was
>>> about how to download it in small chunks so it's not all in
>>> memory at the same time. This code downloads the whole file, but
>>> 8kb at a time.
>> No, I thought when you use Kernel#open with open-uri, it FIRST
>> downloads the entire 1GB file to your temp folder, and THEN runs
>> your block on that file in temp
>
> Interesting. I just tried downloading a 6.1MB file with open-uri
> and didn't see that behavior. I'm using Ruby 1.8.6 on OS X 10.5.

That's good then! I'll test it out myself juuuust to make sure. I
don't to waste 4GB of space when i only need 2GB.

open-uri uses Net::HTTP, of course. Am I correct?

Net::HTTP wraps connections in a Timeout, which is REALLY screwing
with me downloading large files.

Will probably get some monkeys to patch that for me.

- Ari