Stefan Lang
7/29/2006 9:41:00 AM
On Saturday 29 July 2006 10:31, Ben Johnson wrote:
> Ben Johnson wrote:
> > Basically I want to generate an md5 hash from considerably large
> > files to determine if they are exactly the same. Is there a
> > better way to do this besides comparing md5 hashes?
> >
> > Thanks for your help.
>
> I neglected to include some neccessary details, sorry about that.
> Basically the reason I want to do this is so I can store the md5 in
> the database and determine if I have come across this file before.
> So when I receive the file again I can md5 it, query my db, and if
> its in my db I know I've come across this file before.
There are basically two options.
1. Read in the whole file, generate hash:
require 'digest/md5'
Digest::MD5.digest(File.read("data")) => string with binary hash
Digest::MD5.hexdigest(File.read("data")) => string with
hexadecimal digits
2. Read block-wise, save memory ;)
require 'digest/md5'
md5 = Digest::MD5.new
md5.update("chunk of data")
md5.update("another chunk of data")
md5.digest # => string with binary hash
md5.hexdigest # => string with hexadecimal digits
Hope that helps,
Stefan