Robert Klemme
3/19/2006 12:47:00 PM
Bill Kelly <billk@cts.com> wrote:
> From: "rtilley" <rtilley@vt.edu>
>>
>> I'm calculating md5 checksums on very large files (2 GB). This is a
>> safe way to do so, right? Also... is the file closed when the block
>> exits? I'm using 'rb' as this is used on Windows and Linux computers.
>>
>> md5 = Digest::MD5.new()
>> File.open(file, 'rb').each {|line| md5.update(line)}
>
> Hi - does the file really contain text lines? Or is it a file
> full of binary data. If it's a binary file, there may be no
> guarantee the whole thing isn't one very long "line". In that
> case I'd recommend reading it in chunks.
>
> Untested:
>
> md5 = Digest::MD5.new()
> File.open(file, 'rb') do |io|
> while (buf = io.read(4096)) && buf.length > 0
> md5.update(buf)
> end
> end
io.read will return nil at EOF so your test for positive length is basically
obsolete. Also, for reasons of error checking I'd place the digest creation
inside the block because then the digest is never created if the file cannot
be opened:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
while (buf = io.read(4096))
dig.update(buf)
end
dig
end
If you want to increase efficiency, you can do this, which will prevent new
strings to be created as buffers all the time:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
while io.read(4096, buf)
dig.update(buf)
end
dig
end
Here's another nice variant:
md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
dig.update(buf) while io.read(4096, buf)
dig
end
Kind regards
robert