[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

best/better way of md5suming of really large file in ruby?

Kyle Schmitt

4/22/2009 2:00:00 PM

I've got a script that is going through data, and in some cases,
generating md5s of the files. Normally this isn't a problem, but I've
got a few largish (~2G) files in there, and my script is dying on it.
I ran it in a screen so I'm not sure the exact error it threw, but I'm
re-running just that part now to find out. In the meanwhile, any
suggestions?

This is how I'm generating the md5sum right now....
Digest::MD5.hexdigest(File.read(fn))

--Kyle

4 Answers

Yun Huang Yong

4/22/2009 2:19:00 PM

0

Kyle Schmitt wrote:
> I've got a script that is going through data, and in some cases,
> generating md5s of the files. Normally this isn't a problem, but I've
> got a few largish (~2G) files in there, and my script is dying on it.
> I ran it in a screen so I'm not sure the exact error it threw, but I'm
> re-running just that part now to find out. In the meanwhile, any
> suggestions?

I googled for 'md5 large files' and ended up here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...

yun

--
Yun Huang Yong
yun@nomitor.com ...nom nom nom
--

Reid Thompson

4/22/2009 2:34:00 PM

0

On Wed, 2009-04-22 at 23:18 +0900, Yun Huang Yong wrote:
> Kyle Schmitt wrote:
> > I've got a script that is going through data, and in some cases,
> > generating md5s of the files. Normally this isn't a problem, but I've
> > got a few largish (~2G) files in there, and my script is dying on it.
> > I ran it in a screen so I'm not sure the exact error it threw, but I'm
> > re-running just that part now to find out. In the meanwhile, any
> > suggestions?
>
> I googled for 'md5 large files' and ended up here:
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...
>
> yun
>
rthompso@raker /cpartition/hold $ ls -rlt dummyfile
-rw-r--r-- 1 rthompso staff 2147483648 2009-04-22 10:27 dummyfile
rthompso@raker /cpartition/hold $ irb
irb(main):001:0> result = %x[md5sum dummyfile]
=> "a981130cf2b7e09f4686dc273cf7187e dummyfile\n"
irb(main):002:0> p result
"a981130cf2b7e09f4686dc273cf7187e dummyfile\n"
=> nil
irb(main):003:0> def timeit
irb(main):004:1> tstart = Time.now
irb(main):005:1> result = %x[md5sum dummyfile]
irb(main):006:1> tend = Time.now
irb(main):007:1> elapsed = tend - tstart
irb(main):008:1> puts elapsed.to_s
irb(main):009:1> end
=> nil
irb(main):011:0> timeit
10.633416
=> nil


Reid Thompson

4/22/2009 2:51:00 PM

0

On Wed, 2009-04-22 at 23:34 +0900, Reid Thompson wrote:
> On Wed, 2009-04-22 at 23:18 +0900, Yun Huang Yong wrote:
> > Kyle Schmitt wrote:
> > > I've got a script that is going through data, and in some cases,
> > > generating md5s of the files. Normally this isn't a problem, but I've
> > > got a few largish (~2G) files in there, and my script is dying on it.
> > > I ran it in a screen so I'm not sure the exact error it threw, but I'm
> > > re-running just that part now to find out. In the meanwhile, any
> > > suggestions?
> >
> > I googled for 'md5 large files' and ended up here:
> > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-t...
> >
> > yun
> >
> rthompso@raker /cpartition/hold $ ls -rlt dummyfile
> -rw-r--r-- 1 rthompso staff 2147483648 2009-04-22 10:27 dummyfile
> rthompso@raker /cpartition/hold $ irb
> irb(main):001:0> result = %x[md5sum dummyfile]
> => "a981130cf2b7e09f4686dc273cf7187e dummyfile\n"
> irb(main):002:0> p result
> "a981130cf2b7e09f4686dc273cf7187e dummyfile\n"
> => nil
> irb(main):003:0> def timeit
> irb(main):004:1> tstart = Time.now
> irb(main):005:1> result = %x[md5sum dummyfile]
> irb(main):006:1> tend = Time.now
> irb(main):007:1> elapsed = tend - tstart
> irb(main):008:1> puts elapsed.to_s
> irb(main):009:1> end
> => nil
> irb(main):011:0> timeit
> 10.633416
> => nil
>
>
more realistic...
rthompso@raker /cpartition/hold $ dd if=/dev/urandom of=dummyfile
count=4M
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB) copied, 529.518 s, 4.1 MB/s
rthompso@raker /cpartition/hold $ irb
irb(main):001:0> def timeit
irb(main):002:1> tstart = Time.now
irb(main):003:1> result = %x[md5sum dummyfile]
irb(main):004:1> tend = Time.now
irb(main):005:1> elapsed = tend - tstart
irb(main):006:1> puts elapsed.to_s
irb(main):007:1> end
=> nil
irb(main):008:0> timeit
49.366641
=> nil
irb(main):009:0> timeit
48.416673
=> nil
irb(main):010:0>


Kyle Schmitt

4/22/2009 3:20:00 PM

0

Thanks both of you. I'd rather not shell out using %x[, but I may end
up doing that. I tried the modified MD5, and it actually ran in close
to the same time on my work machine, have to see how it does against
my home one.

--Kyle