Asp Forum - finding duplicated lines in folder for refactoring purposes

mcphersonz@gmail.com

6/9/2008 9:54:00 AM

Hello.

I am wondering if anyone knows of a good trick to identify duplicated
lines in a directory for refactoring purposes.

The idea is this: If I can get a listing of all lines, by file in a
directory (recursively) then refactoring code could be focused
eliminating "repeated" lines.

The solution would involve
- recursively searching all files in a given path
- displaying a list of repeated lines (ignoring case and whitespace)
- grouping results by path/filename combination
- sorting results by line repeat count
- ideally only displaying lines that are repeated at lease once...

I'm thinking this should exist either via a application, ruby script,
or shell script.

Does anyone know if any ideas or solutions that are remotely close to
this?

-Shannon

2 Answers

Craig Demyanovich

6/9/2008 11:13:00 AM

Try simian [ http://www.redhillconsulting.com.au/produc... ].

Craig

Robert Klemme

6/9/2008 12:31:00 PM

2008/6/9 mcphersonz@gmail.com <mcphersonz@gmail.com>:
> I am wondering if anyone knows of a good trick to identify duplicated
> lines in a directory for refactoring purposes.

Do you mean "duplicated lines in files in a directory tree"? Ther
term "duplicated lines in a directory" does not make much sense to me
as a directory does not have "lines".

> The idea is this: If I can get a listing of all lines, by file in a
> directory (recursively) then refactoring code could be focused
> eliminating "repeated" lines.
>
> The solution would involve
> - recursively searching all files in a given path
> - displaying a list of repeated lines (ignoring case and whitespace)
> - grouping results by path/filename combination
> - sorting results by line repeat count
> - ideally only displaying lines that are repeated at lease once...
>
> I'm thinking this should exist either via a application, ruby script,
> or shell script.
>
> Does anyone know if any ideas or solutions that are remotely close to
> this?

If you have enough memory or few enough files you could do

# untested
require 'find'
require 'set'

def normalize(line)
l = line.strip
l.gsub!(/\s+/, ' ')
l.downcase!
l
end

duplicates = Hash.new {|h,k| h[k] = Set.new}

Find.find dir do |file|
File.foreach file do |line|
duplicates[ normalize(line) ] << file
end
end

duplicates.each do |line, files|
puts line, files.sort.join(',') if files.size > 1
end

Cheers

robert

--
use.inject do |as, often| as.you_can - without end

comp.lang.ruby

finding duplicated lines in folder for refactoring purposes

mcphersonz@gmail.com

Craig Demyanovich

Robert Klemme

x Login to ForumsZone