Asp Forum - finding files with regular expressions

Remco Hh

10/2/2007 12:17:00 PM

Hi, i am having troubles figuring this out:

I want to search in a directory for files, matching a certain regular
expression. The script should not return true or false, but should give
me a list (array) of filenames which are found.

who can help me a little?
thanks in advance :)

remco
--
Posted via http://www.ruby-....

8 Answers

Richard Conroy

10/2/2007 12:58:00 PM

On 10/2/07, Remco Hh <remco@huijdts.nl> wrote:
> Hi, i am having troubles figuring this out:
>
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.
>
> who can help me a little?
> thanks in advance :)

Look at Ruby's Find library. I am not sure if it can take regexp arguments
(haven't tried, but it would be hella cool).

7stud 7stud

10/2/2007 1:42:00 PM

Remco Hh wrote:
>
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.
>

Try something like this:

results = []

Dir.foreach("./programs_ruby") do |filename|
if filename.index("mod")
results << filename
end
end

p results
--
Posted via http://www.ruby-....

Phrogz

10/2/2007 1:46:00 PM

On Oct 2, 6:17 am, Remco Hh <re...@huijdts.nl> wrote:
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.

Here's my 'findfile' script that I use daily. It lets you use a regexp
for the filename, file content, specify depth of search, whether or
not to show all matches inside a file, and so on.

(You may need to unwrap some of the longer lines after copy/paste.)

See additional notes at the end.

Slim2:/usr/local/bin phrogz$ cat findfile
#!/usr/bin/env ruby

USAGE = <<ENDUSAGE
Usage:
findfile [-d max_depth] [-a] [-c] [-i] name_regexp
[content_regexp]
-d,--depth the maximum depth to recurse to (defaults to no
limit)
-a,--showall with content_regexp, show every match per file
(defaults to only show the first-match per file)
-c,--usecase with content_regexp, use case-sensitive matching
(defaults to case-insensitive)
-i,--includedirs also find directories matching name_regexp
(defaults to files only; incompatible with
content_regexp)
-h,--help show some help examples
ENDUSAGE

EXAMPLES = <<ENDEXAMPLES

Examples:
findfile foo
# Print the path to all files with 'foo' in the name

findfile -i foo
# Print the path to all files and directories with 'foo' in the
name

findfile js$
# Print the path to all files whose name ends in "js"

findfile js$ vector
# Print the path to all files ending in "js" with "Vector" or
"vector"
# (or "vEcTOr", "VECTOR", etc.) in the contents, and print some of
the
# first line that has that content.

findfile js$ -c Vector
# Like above, but must match exactly "Vector" (not 'vector' or
'VECTOR').

findfile . vector -a
# Print the path to every file with "Vector" (any case) in it
somewhere
# printing every line in those files (with line numbers) with that
content.

findfile -d 0 .
# Print the path to every file that is in the current directory.

findfile -d 1 .
# Print the path to every file that is in the current directory or
any
# of its child directories (but no subdirectories of the children).
ENDEXAMPLES

ARGS = {}
UNFLAGGED_ARGS = [ :name_regexp, :content_regexp ]
next_arg = UNFLAGGED_ARGS.first
ARGV.each{ |arg|
case arg
when '-d','--depth'
next_arg = :max_depth
when '-a','--showall'
ARGS[:showall] = true
when '-c','--usecase'
ARGS[:usecase] = true
when '-i','--includedirs'
ARGS[:includedirs] = true
when '-h','--help'
ARGS[:help] = true
else
if next_arg
if next_arg==:max_depth
arg = arg.to_i + 1
end
ARGS[next_arg] = arg
UNFLAGGED_ARGS.delete( next_arg )
end
next_arg = UNFLAGGED_ARGS.first
end
}

if ARGS[:help] or !ARGS[:name_regexp]
puts USAGE
puts EXAMPLES if ARGS[:help]
exit
end

class Dir
def self.crawl( path, max_depth=nil, include_directories=false,
depth=0, &block )
return if max_depth && depth > max_depth
begin
if File.directory?( path )
yield( path, depth ) if include_directories
files = Dir.entries( path ).select{ |f| true unless f=~/^\.
{1,2}$/ }
unless files.empty?
files.collect!{ |file_path|
Dir.crawl( path+'/'+file_path, max_depth,
include_directories, depth+1, &block )
}.flatten!
end
return files
else
yield( path, depth )
end
rescue SystemCallError => the_error
warn "ERROR: #{the_error}"
end
end

end

start_time = Time.new
name_match = Regexp.new(ARGS[:name_regexp], true )
content_match = ARGS[:content_regexp] && Regexp.new( ".
{0,20}#{ARGS[:content_regexp]}.{0,20}", !ARGS[:usecase] )

file_count = 0
matching_count = 0
Dir.crawl( '.', ARGS[:max_depth], ARGS[:includedirs] && !
content_match){ |file_path, depth|
if File.split( file_path )[ 1 ] =~ name_match
if content_match
if ARGS[:showall]
shown_file = false
IO.readlines( file_path ).each_with_index{ |
line_text,line_number|
if match = line_text[content_match]
unless shown_file
puts file_path
matching_count += 1
shown_file = true
end
puts ( "%5d: " % line_number ) + match
end
}
puts " " if shown_file
elsif IO.read( file_path ) =~ content_match
puts file_path," #{$~}"," "
matching_count += 1
end
else
puts file_path
matching_count += 1
end
end
file_count += 1
}
elapsed = Time.new - start_time
puts "Found #{matching_count} file#{matching_count==1?'':'s'} (out of
#{file_count}) in #{elapsed} seconds"

You do have to watch for shell escaping of the regexp, either escaping
chars as needed or quoting your regexp:

Slim2:/usr/local/bin phrogz$ findfile \d
../findfile
../index_gem_repository.rb
../p4d
../rdoc
../rdoc-osa
../svnadmin
../svndumpfilter
../update_rubygems
Found 8 files (out of 40) in 0.001228 seconds

Slim2:/usr/local/bin phrogz$ findfile \\d
../p4
../p4d
../rot13
../sqlite3
Found 4 files (out of 40) in 0.001088 seconds

Slim2:/usr/local/bin phrogz$ findfile \\d$
../p4
../rot13
../sqlite3
Found 3 files (out of 40) in 0.001118 seconds

Slim2:/usr/local/bin phrogz$ findfile "\d$"
../p4
../rot13
../sqlite3
Found 3 files (out of 40) in 0.001298 seconds

Phrogz

10/2/2007 1:49:00 PM

On Oct 2, 6:17 am, Remco Hh <re...@huijdts.nl> wrote:
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.

Sorry, I just re-read your request and saw your desire for an array of
filenames. How about this:

Slim2:/usr/local/bin phrogz$ irb
irb(main):001:0> Dir[ '*' ]
=> ["erb", "fastri-server", "findfile", "fri", "gem", "gem_mirror",
"gem_server", "gemlock", "gemri", "gemwhich", "gpgen",
"index_gem_repository.rb", "irb", "lua", "luac", "mate",
"mongrel_rails", "p4", "p4d", "qri", "rails", "rake", "rdoc", "rdoc-
osa", "ri", "ri-emacs", "rot13", "ruby", "sql", "sqlite3", "svn",
"svnadmin", "svndumpfilter", "svnlook", "svnserve", "svnsync",
"svnversion", "swig", "testrb", "update_rubygems"]

irb(main):002:0> Dir[ '*' ].grep /\d$/
=> ["p4", "rot13", "sqlite3"]

You could use Dir.chdir to pick a working directory if you like.

David A. Black

10/2/2007 1:55:00 PM

Robert Dober

10/2/2007 1:59:00 PM

On 10/2/07, Remco Hh <remco@huijdts.nl> wrote:
> Hi, i am having troubles figuring this out:
>
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.
>
> who can help me a little?
> thanks in advance :)
>
> remco
> --
> Posted via http://www.ruby-....
>
>

Dir.glob("**/**").grep(/filename)
HTH
Robert
--
what do I think about Ruby?
http://ruby-smalltalk.blo...

7stud 7stud

10/2/2007 2:45:00 PM

David A. Black wrote:
>
> A little more concise:
>
> results = Dir.entries("./programs_ruby").grep(/mod/)
>
> Or you could do:
>
> results = Dir["*mod*"]
>
> to automatically exclude hidden files, if that's desired.
>

Thanks. I have some questions though. I notice that a lot of people
that post to the this forum don't employ iterators for reading input as
they go. Instead, they tend to slam everything into memory first, and
then they work on iterating over the data--often with no care at all if
they happen to create a copy or two of the data along the way. I always
try to ask myself, "What if the input is 2-3GB?" I realize that's
probably not going to be the case with filenames, but who knows? There
are multi Terabyte hard drives now. As a result, I always try to
iterate over input as I go rather than read it into memory in one chunk.
Is there something I am missing about ruby in that regard?

I assume that ruby iterators buffer file i/o. Is that not the case? Is
ruby so inefficient that you need to read everything into memory in the
biggest chunks possible to get reasonable performance while iterating
over data. Also, on a side note, it seems like it's standing operating
procedure to shuttle as much code as you can into shell commands. Is
that because people want to avoid using the ruby interpreter?
--
Posted via http://www.ruby-....

Remco Hh

10/2/2007 4:38:00 PM

everybody, thanks for the good advice :)
this is most helpfull

remco

Remco Hh wrote:
> Hi, i am having troubles figuring this out:
>
> I want to search in a directory for files, matching a certain regular
> expression. The script should not return true or false, but should give
> me a list (array) of filenames which are found.
>
> who can help me a little?
> thanks in advance :)
>
> remco

--
Posted via http://www.ruby-....

comp.lang.ruby

finding files with regular expressions

Remco Hh

Richard Conroy

7stud 7stud

Phrogz

Phrogz

David A. Black

Robert Dober

7stud 7stud

Remco Hh

x Login to ForumsZone