Robert Klemme
12/30/2005 11:54:00 AM
Sven Johansson <sven_u_johansson@spray.se> wrote:
> Hi, good people of clr,
>
> I'm just dipped into the goodness that is ruby for the first time
> yesterday, and while this group and the online docs proved useful, I'm
> left somewhat bewildered by a few things. Environment: Win XP SP2,
> one-click-install 1.8.2 ruby.
>
> 1) Current working directories:
> I currently use
>
> f = __FILE__
> len = -f.length
> my_dir = File::expand_path(f)[0...len]
>
> To find the script's current working directory.
No, you get the script's path - although this will incidetally match with
the working directory when run in Windows (because the working directory
defaults to the script directory).
> Snappier alternatives
> such as
>
> my_dir = File.dirname(__FILE__)
>
> just report back with ".", which, while true, isn't exactly helpful.
You want File.expand_path like in
>> File.expand_path('.')
=> "/home/Robert"
Now:
working_dir = File.expand_path( Dir.getwd )
script_dir = File.expand_path( File.dirname(__FILE__) )
> Problem: this only works if the script is invoked from the command
> line as "ruby this.rb". Trying to invoke it by double-clicking on the
> script in the windows explorer makes the above function return an
> empty string. Is there any way, short of embedding the call to ruby
> in a bat file, to make ruby read its currrent working directory even
> if invokend by double-clicking?
See above.
> 2) MD5 hashes and file handles:
> I currently use something like
>
> Dir['*'].each {|f| print Digest::MD5.hexdigest(open(f, 'rb').read), '
> ', f, "\n"}
>
> I tried stuff like
>
> Dir['*'].each {|f|print f, " "; puts
> Digest::MD5.hexdigest(File.read(f))}
> or
> dig=Digest::MD5.new
> dig.update(file)
>
> and they both seem to suffer from some sort of buffer on the directory
> reading; that is, they'll produce the same hash for several files when
> scanning a large directory. The first line above bypasses this, I
> suppose by the 'rb' reading mode on the file handle. Is there any way
> to unbuffer the directory file handle stream (akin to Perl's $|=1)?
Your code in the first line has at least these problems:
1) You don't check for directories, i.e., you'll try to create MD5 of
directories as well.
2) You don't close files properly. You should use the block form of
File.open - that way file handles are always closed properly and timely.
Alternatives
Dir['*'].each {|f| File.open(f,'rb') {|io| print f, " ",
Digest::MD5.hexdigest(io.read), "\n" } if File.file? f}
Dir['*'].each {|f| print f, " ", Digest::MD5.hexdigest(File.open(f,'rb')
{|io| io.read}), "\n" if File.file? f}
I can't reproduce the problem you state (identical digests) with the other
lines of code. I tried
Dir['*'].each {|f|print f, " "; puts Digest::MD5.hexdigest(File.read(f)) if
File.file? f}
But the problem here is that the file is not opened in binary mode which is
a must for this to work.
> 3) Finally, I submit for very first ruby script for merciless
> criticism. What here could have been done otherwise? What screams for
> a better ruby solution? I'm aware of that I should probably look into
> split instead of relying so much on regexps for splitting and I was
> trying to set up a structure like hash[key]=[a,b], but I found I could
> not access hash.each_pair { |key,value] puts key, value(0), value (1)
> }.
>
> ------------------------------------------------------------------
> require 'Digest/md5'
> require 'fileutils'
>
> # Variables to set manually
> global_digest_index='C:/srfctrl/indexfile/globalindex.txt'
> global_temp_directory='C:/srfctrl/tempstore/'
> global_collide_directory='C:/srfctrl/collide/'
>
> # Begin program
> f = __FILE__
> len = -f.length
> my_dir = File::expand_path(f)[0...len]
> my_dirname = my_dir.sub(/^.+\/(\w+?)\/$/,'\1')
>
> puts my_dir
> puts my_dirname
>
> digest_map_name={}
> digest_map_directory={}
>
> IO.foreach(global_digest_index) { |line|
> th_dige=line.sub(/^.+?\:(.+?)\:.+?$/,'\1').chomp
> th_fnam=line.sub(/^.+?\:.+?\:(.+?)$/,'\1').chomp
> th_dir=line.sub(/^(.+?)\:.+?\:.+?$/,'\1').chomp
> digest_map_name[th_dige] = th_fnam
> digest_map_directory[th_dige] = th_dir
> }
>
> filecnt = filesuc = 0
> outfile = File.new(global_digest_index, "a")
> Dir['*'].each do |file_name|
> next unless (file_name =~ /\.mp3$|\.ogg$/i)
> filecnt += 1
> hex = Digest::MD5.hexdigest(open(file_name, 'rb').read)
> if digest_map_name.has_key?(hex) then
> collfilestrip = digest_map_name[hex].sub(/\.mp3$|\.ogg$/i,'')
> id_name = global_collide_directory + digest_map_directory[hex].to_s
> + '_' + collfilestrip + '_' + file_name
> FileUtils.cp(file_name,id_name)
> else
> filesuc +=1
> digest_map_name[hex] = file_name
> digest_map_directory[hex] = my_dirname
> outfile.puts my_dirname + ':' + hex + ':' + file_name
> id_name = global_temp_directory + file_name
> FileUtils.cp(digest_map_name[hex],id_name)
> end
> end
> outfile.close
>
> puts "Processed " + filecnt.to_s + " files, out of which " +
> filesuc.to_s + " were not duplicates."
> ----------------------------------------------
It's not completely clear to me what you want to do here. Apparently you
check a number of audio files and shove them somewhere else based on some
criterion. What's the aim of doing this?
Kind regards
robert