Ezra Zygmuntowicz
2/16/2007 11:02:00 PM
Hi~
On Feb 16, 2007, at 3:08 AM, Eleanor McHugh wrote:
> On 15 Feb 2007, at 15:25, Eivind wrote:
>> def fetch(fname)
>> File.open(fname, 'r') do |fp|
>> while buf = fp.read(4096)
>> yield(buf)
>> end
>> end
>> return nil
>> end
>>
>>
>> def store_from(fname, there)
>> puts
>> size = there.size(fname)
>> wrote = 0
>>
>> File.rename(fname, fname + '.bak') if File.exists? fname
>> File.open(fname, 'w') do |fp|
>> yield([wrote, size]) if block_given?
>> there.fetch(fname) do |buf|
>> wrote += fp.write(buf)
>> yield([wrote, size]) if block_given?
>> nil
>> end
>> fp.close
>> end
>>
>> return wrote
>> end
>
> Your slowdown is an artefact of breaking the file read and transmit
> operations down into chunks of 4096 bytes. This will cause your
> 600kb word document to be sent as 150 discrete messages across the
> network, each time incurring the cost of a disk seek and probably
> the cost of network congestion. The fact that you're running both
> pieces of code on the same machine will also add 150 additional
> disk seeks into the equation for the write process. These all incur
> non-deterministic costs based on the actual layout of the file
> system, task switching by the OS between disk operations,
> particular OSs disk caching mechanisms, etc.
>
> If you read the entire file into memory in one chunk that will
> reduce the cost at one end, then by buffering the whole thing in
> memory at the other end until the transfer is complete you'll
> reduce the other cost. As you are probably transmitting over TCP I
> also wouldn't bother to break the file up into discrete chunks as
> the underlying transport will take care of that for you (and 4096
> is very rarely an optimal block size: for ethernet traffic try
> somewhere around 1536, and for disk access it'll depend on the
> settings for the file-system and the physical geometry of the disk).
>
> As a general rule of thumb, always seek to minimise the number of I/
> O operations that your code is performing if you want to avoid
> these kinds of problems. I/O is orders of magnitude slower than
> anything else.
>
> Ellie
Sending a file across drb like that is also incurring the cost of
Marshalling and unmarshaling the file. I would think you would be
better off having one of the drb processes use net/sftp to transfer
the file to the other node and then send a drb message with the file
path.
Cheers-
-- Ezra Zygmuntowicz
-- Lead Rails Evangelist
-- ez@engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)