pihentagy@gmail.com
6/14/2006 10:34:00 PM
Tim Hunter wrote:
> Probably you should open the files with "rb" instead of letting it
> default to "r".
Holy s**t! Since I tried and failed on textfiles, I don't know why does
it count anyway.
Ah, that damned \r\n - \n transformation I guess.
> For finding dups, I wonder if it's useful to compare checksums unless
> you've already computed them in advance. I notice that Ruby's own
> FileUtils.install checks filea == fileb by simply comparing the files
> until it finds a difference or gets to EOF.
Well, first I'd like to partition files based on filesize. And after
that, I compare them.
If you have more than 2 files having the same size, it's better to
calculate sha1sum for all the files involved once. And, if you'd like
to live on the safe side, you can compare by content the files having
the same sha1sum.
And, you can improve caching sha1sums (say in a file in every
directory).