[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Net:HTTP performance downloading large files

Chad Burt

11/2/2006 12:44:00 AM

Hi folks,
I'm working on my first ruby/rails project and have run into my first
major problem.
My application uses a web api for a scientific database and I am trying
to download a very large file(150mb) using Net:HTTP.post_form. This
operation is turning out to be very slow.

Net:HTTP is at least 10x slower than using my browser to download this
file on my local network. It also pegs one of my processors while doing
it.

It seems to have something to do with a "green thread" issue explained
here :
http://headius.blogspot.com/2006_06_01_archive.html#11499604...

Are there any alternatives to using Net:HTTP to download files off the
web with ruby?


---------- Code in question -----------
def Metacat.read(docid)
#load uri set in environment.rb
uri = URI.parse(Path_to_metacat)
#uri.query = "action=read&qformat=xml&docid=#{docid}"
response = Net::HTTP.post_form(uri, {
'action' => 'read',
'qformat' => 'xml',
'docid' => docid
})
#this line will raise an exception if post failed
response.value
if(response.content_type == "text/xml")
doc = REXML::Document.new(response.body)
#check to see if Metacat is sending an error message or EML
if(doc.root.name == 'error')
nil
else
Eml.new(response.body)
end
elsif(response.content_type == "text/plain")
DataTable.new(docid, response.body)
end
end
--------------------------------------

File I'm trying to download :
http://data.piscoweb.org/catalog/metacat?action=read&qformat=xml&docid=HMS001_020ADCP019R00_200...

--
Posted via http://www.ruby-....

8 Answers

Craig Beck

11/2/2006 5:21:00 AM

0

How about just calling out to curl?

--
Craig Beck

AIM: kreiggers



Louis J Scoras

11/2/2006 1:48:00 PM

0

Chad;

This might be a stupid question, but it's always worth asking just in
case =). Are you sure that the performance problem in the code above
is in fetching the document. A 150M xml file can take a long time to
parse into a REXML document.

Actually, in a quick read of this:

> if(response.content_type == "text/xml")
> doc = REXML::Document.new(response.body)
> #check to see if Metacat is sending an error message or EML
> if(doc.root.name == 'error')
> nil
> else
> Eml.new(response.body)
> end
> elsif(response.content_type == "text/plain")
> DataTable.new(docid, response.body)
> end
> end

It doesn't look like your using 'doc' to do anything except check the
root node. Meanwhile REXML has to parse the entire document--tree
parser. You might want to give one of the streaming parsers a shot.


--
Lou.

Aaron Patterson

11/2/2006 11:16:00 PM

0

On Thu, Nov 02, 2006 at 09:43:30AM +0900, Chad Burt wrote:
> Hi folks,
> I'm working on my first ruby/rails project and have run into my first
> major problem.
> My application uses a web api for a scientific database and I am trying
> to download a very large file(150mb) using Net:HTTP.post_form. This
> operation is turning out to be very slow.
>
> Net:HTTP is at least 10x slower than using my browser to download this
> file on my local network. It also pegs one of my processors while doing
> it.
>
> It seems to have something to do with a "green thread" issue explained
> here :
> http://headius.blogspot.com/2006_06_01_archive.html#11499604...

It probably has more to do with the buffer size used in Net::HTTP.
Check out this thread:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

Hope that helps!

--
Aaron Patterson
http://tenderlovem...

Chad Burt

11/14/2006 10:34:00 PM

0


> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

Thanks. Found this file:


and changed

def rbuf_fill
timeout(@read_timeout) {
#changed by cburt
@rbuf << @io.sysread(1024)
}
end

to

def rbuf_fill
timeout(@read_timeout) {
#changed by cburt
@rbuf << @io.sysread(16384)
}
end

Now downloading a 150MB file takes 25 seconds compared to 21 seconds for
straight curl and 40-sum seconds for curl using a popen.

The problem now is that I have a web API client that I was going to
package into a ruby-gem that would have been easy to install. Now I have
to tell people to start hacking the standard library if they want to use
it. Uhhg!

--
Posted via http://www.ruby-....

Chad Burt

11/14/2006 10:36:00 PM

0

sorry meant to say I modified this file:
/usr/local/lib/ruby/1.8/net/protocol.rb

--
Posted via http://www.ruby-....

Keith Fahlgren

11/14/2006 10:48:00 PM

0

On 11/14/06, Chad Burt <chad@underbluewaters.net> wrote:
> Thanks. Found this file: /usr/local/lib/ruby/1.8/net/protocol.rb
> and changed
> ...
> The problem now is that I have a web API client that I was going to
> package into a ruby-gem that would have been easy to install. Now I have
> to tell people to start hacking the standard library if they want to use
> it. Uhhg!

With Ruby's open classes, you shouldn't have to. At the top of your
file/library/program, just open the class you'd like to modify, in
this case the module Net and the class BufferedIO, and do what you
want.

Something like:
module Net
class BufferedIO
def rbuf_fill
timeout(@read_timeout) {
#changed by cburt to a much larger buffer for speed
@rbuf << @io.sysread(16384)
}
end
end
end

Note: Modifying the standard library is usually considered bad form,
but if you know what you're doing and are explicit about it it's
usually ok.


HTH,
Keith

Chad Burt

11/14/2006 11:24:00 PM

0

That worked great Keith, thanks.

--
Posted via http://www.ruby-....

Comfort Eagle

11/24/2006 7:05:00 PM

0

Works great with me too!

--
Posted via http://www.ruby-....