Robert Klemme
1/11/2008 4:51:00 PM
On 11.01.2008 16:19, Kyle Schmitt wrote:
> I'm writing some scripts to help manage a mail scanner used at my
> work. Being a mail scanner, it's got huuuuUUUge quarantine
> directories.
>
> Now, I know I can do something along the lines of:
>
> Dir.open("/foo").collect.length-2 #if you're wondering, the -2 is to
> ignore . and ..
You could as well do
count = Dir.entries("/foo").size - 2
> to get a count of what's in a directory, but the problem there is,
> it's rather slow when you run that in a directory with a few thousand
> files on a server under a severe (4.5>average_load>2) load.
>
> After perusing the Dir, Find and Stat classes, I haven't seen a better way.
> I thought that perhaps there was some sort of system call, at least in
> Real OSes™ (Linux, *BSD, Unix, etc), that would return the number of
> files inside of a directory. Something that would hopefully return in
> a 1/4th or 1/8th a second, rather than in 4 or 8 (or 20...) seconds.
>
> Any clues?
The major time will be IO and that cannot be changed I guess. You could
however do some form of caching: read the size and the last mod date of
each dir you are interested in and store that in a Hash (and write that
via Marshal to disk between invocations if you process terminates in
between). Then you need only check whether the mod date has changed and
only read the directory if it has. Disadvantage is that you need one
more IO - albeit that will pull just one block so it might pay off.
Kind regards
robert