[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Re: optimizing Hash deletions

Bill Kelly

6/26/2007 3:20:00 AM

From: "Mike Steiner" <mikejaysteiner@gmail.com>
>
> I have a recursive file-compare program that I run on 2 directories that
> each contain about 10,000 files, 99% of which don't change. I read all the
> filenames into hashes and then (first) delete all the matches, using this
> code:
>
> def RemoveIdenticalDirs ( old_fe_list , new_fe_list )
> # this function is necessary because we have to do a case-insensitive
> match for directories, but not for files
> $stderr.puts "Detecting identical dirs..."
> old_fe_list.keys.each do | oldfile |
> new_fe_list.keys.each do | newfile | # can't use .has_key? because
> we have to do a case-insensitive match
> if old_fe_list[oldfile].is_dir and new_fe_list[newfile].is_dir
> and oldfile.downcase == newfile.downcase
> old_fe_list.delete ( oldfile )
> new_fe_list.delete ( newfile )
> break
> end
> end
> end
> end
>
> def RemoveIdenticalFiles ( old_fe_list , new_fe_list , comparetype )
> $stderr.puts "Detecting identical files..."
> old_fe_list.keys.each do | file |
> if new_fe_list.has_key? ( file )
> if !(old_fe_list[file].is_dir) and !(new_fe_list[file].is_dir)
> and FilesAreIdentical ( comparetype , old_fe_list[file] , new_fe_list[file]
> )
> old_fe_list.delete ( file )
> new_fe_list.delete ( file )
> end
> end
> end
> end
>
> Note that I've added an attribute to each hash called .is_dir which just
> holds a Boolean value.
>
> When I run the program (under WinXP using Ruby 185-21, if it matters), it
> takes about 5-10 seconds to execute the above 2 functions. But that's not
> the worst part - it chews up memory big time. The machine I'm running it on
> has 768MB of RAM (with no swap file), and Windows gives me a warning that
> it's out of memory as the above code runs. However, neither Windows nor the
> program crashes. I'm guessing that all the .delete's are causing lots of
> memory usage and the garbage collector starts running.
>
> So my question is - is there a less memory-intensive way of doing the above?
> Would setting elements to nil instead of deleting them make any difference?
> Or maybe copying entries to a new hash somehow?

You've got an O(n**2) search in RemoveIdenticalDirs. How long
does RemoveIdenticalDirs take to execute, as compared to
RemoveIdenticalFiles?

Also, what does FilesAreIdentical() do ?


Regards,

Bill