Asp Forum - Nuby - help with Ruby object references

Chuck Remes

2/24/2006 12:36:00 AM

I'm very new to Ruby (as in, just started yesterday). As a learning
exercise I decided to write a short program that would traverse a
directory tree and take note of all duplicate files in that tree.

I'm using a hash of arrays to track all references. The key is the
filename and I push the path as an array to store the value. If I run
across a filename for which a key already exists in the hash, I do a
deeper equality check to see if they are really the same file or if
they are different. The "deeper check" is comparing file sizes.

If they are the same, I push this new path into my array of arrays so
I can check against it if I find yet another file with that name. If
they're different, I push this new path onto a SECOND hash of arrays.

This is where I have trouble. As soon as I push any value into this
second hash, it takes on the identity of the first hash. I don't
understand why because I am not doing any explicit operation to make
hash1 = hash2. Maybe it's a side effect of some other operation.
Anyway, enough talk... the code is below.

I appreciate any and all insight.

cr

--- code here ---

#!/usr/bin/env ruby

require 'find'

h = Hash.new { |h,k| h[k] = [] }
duplicates = Hash.new { |h,k| h[k] = [] }

working_path = ARGV[0] || ENV["PWD"]

Find.find(working_path) do |path|
# if it's a dir, skip to the next path
if File.directory?(path)
next
end
file = File.basename(path)

# if this key doesn't exist in the hash, add it
if h.has_key?(file) == false
h[file].push([path])
else # key already exists in hash
# add file size to hash unless it was already grabbed
h[file].push([path])
h[file].each do |subarray|
subarray[1] = File.size(subarray[0]) unless subarray[1]
end

# now compare the current file's size to the prior check
h[file].each do |subarray|
puts "subarray[0] = #{subarray[0]} and path = #{path}"
if subarray[0].eql?(path) == false && subarray[1] == File.size
(path)
# add to dupe hash
puts "DUPLICATE DUPLICATE DUPLICATE DUPLICATE"
puts "DUP BEFORE h.id = #{h.object_id} and duplicates.id = #
{duplicates.object_id}"
duplicates[file].push([path]) # at this point "h" and
"duplicates" refer to the same object!
puts "DUP AFTER h.id = #{h.object_id} and duplicates.id = #
{duplicates.object_id}"
end
end
end
end

puts "\n\nThe duplicates are..."
duplicates.each do |key, value|
puts "key = #{key}"
value.each do |a|
print "#{a[0]} #{a[1]} "
end
print "\n"
end

1 Answer

Eero Saynatkari

2/24/2006 1:58:00 AM

unknown wrote:
> <skip due to rforum/>
>
> I appreciate any and all insight.
>
> cr
>
> --- code here ---
>
> #!/usr/bin/env ruby
>
> require 'find'
>
> h = Hash.new { |h,k| h[k] = [] }
> duplicates = Hash.new { |h,k| h[k] = [] }

I am pretty sure this is where the problem is. Blocks
are closures so the 'h' in the second block refers to
the first Hash. Just change the first variable name
from 'h' to 'hash' in the whole file (it is a bit
clearer anyway) and you should be OK.

> <skip due to rforum/>

E

--
Posted via http://www.ruby-....

comp.lang.ruby

Nuby - help with Ruby object references

Chuck Remes

Eero Saynatkari

x Login to ForumsZone