[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

safe way to calc md5 on very large files

Brad Tilley

3/18/2006 11:46:00 PM

I'm calculating md5 checksums on very large files (2 GB). This is a safe
way to do so, right? Also... is the file closed when the block exits?
I'm using 'rb' as this is used on Windows and Linux computers.

md5 = Digest::MD5.new()
File.open(file, 'rb').each {|line| md5.update(line)}
15 Answers

Ara.T.Howard

3/19/2006 4:50:00 AM

0

Andrew Johnson

3/19/2006 5:07:00 AM

0

On Sun, 19 Mar 2006 13:49:51 +0900, ara.t.howard@noaa.gov
<ara.t.howard@noaa.gov> wrote:
> On Sun, 19 Mar 2006, Stephen Waits wrote:
>>
>> Close.. try this..
>>
>> require 'md5'
>> File.open(filename,'rb') { |f| MD5.hexdigest(f.read) }
>>
>> And yes, the file is closed with the block form of open.
>>
>> --Steve
>
> i think the OP has the right approach - note that an 'f.read' will consume
> 2GB. but the OP's code
>
> harp:~ > cat a.rb
> require 'digest/md5'
> md5 = Digest::MD5.new() and open(ARGV.shift, 'rb').each{|line| md5 << line}
> p md5.hexdigest
>
> will not.


In my reading of the OP, both the block-open and iteration are actually
desired:

md5 = Digest::MD5.new
File.open(file,'rb') do |ios|
ios.each {|line| md5 << line }
end

cheers,
andrew

--
Andrew L. Johnson http://www.s...
What have you done to the cat? It looks half-dead.
-- Schroedinger's wife

Bill Kelly

3/19/2006 5:46:00 AM

0

From: "rtilley" <rtilley@vt.edu>
>
> I'm calculating md5 checksums on very large files (2 GB). This is a safe
> way to do so, right? Also... is the file closed when the block exits?
> I'm using 'rb' as this is used on Windows and Linux computers.
>
> md5 = Digest::MD5.new()
> File.open(file, 'rb').each {|line| md5.update(line)}

Hi - does the file really contain text lines? Or is it a file
full of binary data. If it's a binary file, there may be no
guarantee the whole thing isn't one very long "line". In that
case I'd recommend reading it in chunks.

Untested:

md5 = Digest::MD5.new()
File.open(file, 'rb') do |io|
while (buf = io.read(4096)) && buf.length > 0
md5.update(buf)
end
end


Regards,

Bill




Robert Klemme

3/19/2006 12:37:00 PM

0

Andrew Johnson <ajohnson@cpan.org> wrote:
> On Sun, 19 Mar 2006 13:49:51 +0900, ara.t.howard@noaa.gov
> <ara.t.howard@noaa.gov> wrote:
>> On Sun, 19 Mar 2006, Stephen Waits wrote:
>>>
>>> Close.. try this..
>>>
>>> require 'md5'
>>> File.open(filename,'rb') { |f| MD5.hexdigest(f.read) }
>>>
>>> And yes, the file is closed with the block form of open.
>>>
>>> --Steve
>>
>> i think the OP has the right approach - note that an 'f.read' will
>> consume 2GB. but the OP's code
>>
>> harp:~ > cat a.rb
>> require 'digest/md5'
>> md5 = Digest::MD5.new() and open(ARGV.shift, 'rb').each{|line|
>> md5 << line} p md5.hexdigest
>>
>> will not.
>
>
> In my reading of the OP, both the block-open and iteration are
> actually desired:
>
> md5 = Digest::MD5.new
> File.open(file,'rb') do |ios|
> ios.each {|line| md5 << line }
> end

IMHO it's a bad idea to use line oriented reading on a binary file because
"lines" can be arbitrary long (i.e. the whole file in worst case). Using
IO#read is much better.

Kind regards

robert

Robert Klemme

3/19/2006 12:47:00 PM

0

Bill Kelly <billk@cts.com> wrote:
> From: "rtilley" <rtilley@vt.edu>
>>
>> I'm calculating md5 checksums on very large files (2 GB). This is a
>> safe way to do so, right? Also... is the file closed when the block
>> exits? I'm using 'rb' as this is used on Windows and Linux computers.
>>
>> md5 = Digest::MD5.new()
>> File.open(file, 'rb').each {|line| md5.update(line)}
>
> Hi - does the file really contain text lines? Or is it a file
> full of binary data. If it's a binary file, there may be no
> guarantee the whole thing isn't one very long "line". In that
> case I'd recommend reading it in chunks.
>
> Untested:
>
> md5 = Digest::MD5.new()
> File.open(file, 'rb') do |io|
> while (buf = io.read(4096)) && buf.length > 0
> md5.update(buf)
> end
> end

io.read will return nil at EOF so your test for positive length is basically
obsolete. Also, for reasons of error checking I'd place the digest creation
inside the block because then the digest is never created if the file cannot
be opened:

md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
while (buf = io.read(4096))
dig.update(buf)
end
dig
end

If you want to increase efficiency, you can do this, which will prevent new
strings to be created as buffers all the time:

md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
while io.read(4096, buf)
dig.update(buf)
end
dig
end

Here's another nice variant:

md5 = File.open(file, 'rb') do |io|
dig = Digest::MD5.new
buf = ""
dig.update(buf) while io.read(4096, buf)
dig
end

Kind regards

robert

Brad Tilley

3/19/2006 2:48:00 PM

0

Robert Klemme wrote:
> io.read will return nil at EOF so your test for positive length is
> basically obsolete. Also, for reasons of error checking I'd place the
> digest creation inside the block because then the digest is never
> created if the file cannot be opened:
>
> md5 = File.open(file, 'rb') do |io|
> dig = Digest::MD5.new
> while (buf = io.read(4096))
> dig.update(buf)
> end
> dig
> end

Thank you Robert, Billy and others! Your suggestions have helped me to
solve the problem.

Tanaka Akira

3/19/2006 3:21:00 PM

0

In article <48526dFif9i5U1@individual.net>,
"Robert Klemme" <bob.news@gmx.net> writes:

> md5 = File.open(file, 'rb') do |io|
> dig = Digest::MD5.new
> buf = ""
> while io.read(4096, buf)
> dig.update(buf)
> end
> dig
> end

Why we have no such method in the digest library?

I think it is useful enough to have in the library.
--
Tanaka Akira


Erik Veenstra

3/19/2006 7:20:00 PM

0

> Why we have no such method in the digest library?

I extended the MD5 class with a class method to build an MD5
object directly from the contents of a given file.

Use it like this:

md5 = MD5.file("foo.bar")

gegroet,
Erik V. - http://www.erikve...

----------------------------------------------------------------

require "md5"

class MD5
def self.file(file)
File.open(file, "rb") do |f|
res = self.new
while (data = f.read(4096))
res << data
end
res
end
end
end

----------------------------------------------------------------

Brad Tilley

3/19/2006 7:35:00 PM

0

Erik Veenstra wrote:
>>Why we have no such method in the digest library?
>
>
> I extended the MD5 class with a class method to build an MD5
> object directly from the contents of a given file.

Should this be done to sha1, sha2, etc?


> Use it like this:
>
> md5 = MD5.file("foo.bar")
>
> gegroet,
> Erik V. - http://www.erikve...
>
> ----------------------------------------------------------------
>
> require "md5"
>
> class MD5
> def self.file(file)
> File.open(file, "rb") do |f|
> res = self.new
> while (data = f.read(4096))
> res << data
> end
> res
> end
> end
> end
>
> ----------------------------------------------------------------
>

Mirelle

7/22/2008 3:43:00 AM

0

On Jul 21, 4:52 pm, Gary Renzetti <lizg...@connection.com> wrote:
> Grosvenor Alert wrote:
> > On Jul 21, 11:14 am, Gary Ranzitti <lizg...@connection.com> wrote:

> >http:--BANNED ADVERTISEMENT FOR OY VEY, WHO GETS PAID TO POST.

> Oy vey, I do believe you are mistaken in accusing Grosvenor of this
> forgery. Obviously, it wasn't myself, but I'd be more inclined towards a
> couple of the more psychotic of the zionist posters on here.

I'm fairly certain it is Oy Vey himself who is posting this stuff.
I spoke with Kurt Knoll about it, he says that Oy Vey makes up
antagonistic posts just so he can peddle his nizcor crapola holo-cash,
after all he gets paid to post; unlike the rest of us who do it for
the love of peace and justice.

Mirelle

Kenneth McVay OBC wrote:
> In article <24202f6e-a304-4f2d-bd1f-54353f70d...@e39g2000hsf.googlegroups..com>,
> Mirelle <gentile.mire...@gmail.com> wrote:
> >Strange, These Photos, Names And Addresses Keep Popping Up All Over
> >The Net?
> >I wonder why?

> Who cares?

You do, Paid/Goy Shill.

Strange, These Photos, Names And Addresses Keep Popping Up All Over
The Net?
I wonder why?
Could it be due to DOUBLE STANDARDS????
Could it be that Scott Pickadily or William Grosberlin or Kurt Knows,
Keeps Posting Them Until That Phony Blog About Me Is Taken Down???
I Wonder??? Hmmm? Very Strange...
http://chitchatzionist.blo...
______________________________________

Tell me, McVay, why should the whole world remember the
'holocaust' (that does not include the tens of millions who died in
WWII that were not Jewish) when the same people who suffered this...
later do this...:
http://aliyaallzionists.files.wordpress.com/2008/06/dountoo...

http://aliyaallzionists.word...lest-...

http://aliyaallzionists.word......

http://aliyaallzionists.word...zionist-cannibalistic...

http://aliyaallzionists.word...we-are-all-pal...

http://aliyaallzionists.word...wanted-parasite-seeks......

http://aliyaallzionists.word...star-of-...

http://aliyaallzionists.word...silence-is-c...

http://aliyaallzionists.word...zionists-are-the-same...

http://aliyaallzionists.word...jews-only-by-p...

http://aliyaallzionists.word...hasbarapropagan...

http://aliyaallzionists.word...famous-pal...

http://aliyaallzionists.word...dyin...

http://aliyaallzionists.word...