Asp Forum - dRuby file transfer performance issue

Eivind

2/15/2007 3:22:00 PM

Hi,

I'm a Ruby newbie fra Norway (say that many times fast:)

Currently i'm trying to send files from one application to another
using distributed ruby (dRuby).

The files are sent, but it takes "forever".
I tried to send a Word-document (about 600 kB), and it took more than
two minutes when both applications ran locally on the same machine.

Do I have to do something special if I'm working with files other than
ordinary text?

This is the code I'm using:
###

def fetch(fname)
File.open(fname, 'r') do |fp|
while buf = fp.read(4096)
yield(buf)
end
end
return nil
end

def store_from(fname, there)
puts
size = there.size(fname)
wrote = 0

File.rename(fname, fname + '.bak') if File.exists? fname
File.open(fname, 'w') do |fp|
yield([wrote, size]) if block_given?
there.fetch(fname) do |buf|
wrote += fp.write(buf)
yield([wrote, size]) if block_given?
nil
end
fp.close
end

return wrote
end

2 Answers

Eleanor McHugh

2/16/2007 11:08:00 AM

On 15 Feb 2007, at 15:25, Eivind wrote:
> def fetch(fname)
> File.open(fname, 'r') do |fp|
> while buf = fp.read(4096)
> yield(buf)
> end
> end
> return nil
> end
>
>
> def store_from(fname, there)
> puts
> size = there.size(fname)
> wrote = 0
>
> File.rename(fname, fname + '.bak') if File.exists? fname
> File.open(fname, 'w') do |fp|
> yield([wrote, size]) if block_given?
> there.fetch(fname) do |buf|
> wrote += fp.write(buf)
> yield([wrote, size]) if block_given?
> nil
> end
> fp.close
> end
>
> return wrote
> end

Your slowdown is an artefact of breaking the file read and transmit
operations down into chunks of 4096 bytes. This will cause your 600kb
word document to be sent as 150 discrete messages across the network,
each time incurring the cost of a disk seek and probably the cost of
network congestion. The fact that you're running both pieces of code
on the same machine will also add 150 additional disk seeks into the
equation for the write process. These all incur non-deterministic
costs based on the actual layout of the file system, task switching
by the OS between disk operations, particular OSs disk caching
mechanisms, etc.

If you read the entire file into memory in one chunk that will reduce
the cost at one end, then by buffering the whole thing in memory at
the other end until the transfer is complete you'll reduce the other
cost. As you are probably transmitting over TCP I also wouldn't
bother to break the file up into discrete chunks as the underlying
transport will take care of that for you (and 4096 is very rarely an
optimal block size: for ethernet traffic try somewhere around 1536,
and for disk access it'll depend on the settings for the file-system
and the physical geometry of the disk).

As a general rule of thumb, always seek to minimise the number of I/O
operations that your code is performing if you want to avoid these
kinds of problems. I/O is orders of magnitude slower than anything else.

Ellie

Eleanor McHugh
Games With Brains
----
raise ArgumentError unless @reality.responds_to? :reason

Ezra Zygmuntowicz

2/16/2007 11:02:00 PM

Hi~

On Feb 16, 2007, at 3:08 AM, Eleanor McHugh wrote:

> On 15 Feb 2007, at 15:25, Eivind wrote:
>> def fetch(fname)
>> File.open(fname, 'r') do |fp|
>> while buf = fp.read(4096)
>> yield(buf)
>> end
>> end
>> return nil
>> end
>>
>>
>> def store_from(fname, there)
>> puts
>> size = there.size(fname)
>> wrote = 0
>>
>> File.rename(fname, fname + '.bak') if File.exists? fname
>> File.open(fname, 'w') do |fp|
>> yield([wrote, size]) if block_given?
>> there.fetch(fname) do |buf|
>> wrote += fp.write(buf)
>> yield([wrote, size]) if block_given?
>> nil
>> end
>> fp.close
>> end
>>
>> return wrote
>> end
>
> Your slowdown is an artefact of breaking the file read and transmit
> operations down into chunks of 4096 bytes. This will cause your
> 600kb word document to be sent as 150 discrete messages across the
> network, each time incurring the cost of a disk seek and probably
> the cost of network congestion. The fact that you're running both
> pieces of code on the same machine will also add 150 additional
> disk seeks into the equation for the write process. These all incur
> non-deterministic costs based on the actual layout of the file
> system, task switching by the OS between disk operations,
> particular OSs disk caching mechanisms, etc.
>
> If you read the entire file into memory in one chunk that will
> reduce the cost at one end, then by buffering the whole thing in
> memory at the other end until the transfer is complete you'll
> reduce the other cost. As you are probably transmitting over TCP I
> also wouldn't bother to break the file up into discrete chunks as
> the underlying transport will take care of that for you (and 4096
> is very rarely an optimal block size: for ethernet traffic try
> somewhere around 1536, and for disk access it'll depend on the
> settings for the file-system and the physical geometry of the disk).
>
> As a general rule of thumb, always seek to minimise the number of I/
> O operations that your code is performing if you want to avoid
> these kinds of problems. I/O is orders of magnitude slower than
> anything else.
>
> Ellie

Sending a file across drb like that is also incurring the cost of
Marshalling and unmarshaling the file. I would think you would be
better off having one of the drb processes use net/sftp to transfer
the file to the other node and then send a drb message with the file
path.

Cheers-
-- Ezra Zygmuntowicz
-- Lead Rails Evangelist
-- ez@engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)

comp.lang.ruby

dRuby file transfer performance issue

Eivind

Eleanor McHugh

Ezra Zygmuntowicz

x Login to ForumsZone