Robert Klemme
6/9/2008 12:31:00 PM
2008/6/9 mcphersonz@gmail.com <mcphersonz@gmail.com>:
> I am wondering if anyone knows of a good trick to identify duplicated
> lines in a directory for refactoring purposes.
Do you mean "duplicated lines in files in a directory tree"? Ther
term "duplicated lines in a directory" does not make much sense to me
as a directory does not have "lines".
> The idea is this: If I can get a listing of all lines, by file in a
> directory (recursively) then refactoring code could be focused
> eliminating "repeated" lines.
>
> The solution would involve
> - recursively searching all files in a given path
> - displaying a list of repeated lines (ignoring case and whitespace)
> - grouping results by path/filename combination
> - sorting results by line repeat count
> - ideally only displaying lines that are repeated at lease once...
>
> I'm thinking this should exist either via a application, ruby script,
> or shell script.
>
> Does anyone know if any ideas or solutions that are remotely close to
> this?
If you have enough memory or few enough files you could do
# untested
require 'find'
require 'set'
def normalize(line)
l = line.strip
l.gsub!(/\s+/, ' ')
l.downcase!
l
end
duplicates = Hash.new {|h,k| h[k] = Set.new}
Find.find dir do |file|
File.foreach file do |line|
duplicates[ normalize(line) ] << file
end
end
duplicates.each do |line, files|
puts line, files.sort.join(',') if files.size > 1
end
Cheers
robert
--
use.inject do |as, often| as.you_can - without end