Phlip
2/7/2008 8:52:00 PM
Chris Richards wrote:
> Im required to open 50+ files and parse the data in them. WOuld using
> multiple threads give me the best performance? or is it best just to do
> it sequentially?
Fifty files of sub-megabyte size is a piffling on a modern CPU. Between your
code and the hard drive surface are several layers of buffers, most supported by
dedicated hardware. They are all geared to sequential reads. For example, if you
read 1k from a file, and if the read-write head is still flying over that file
when it reaches the end of that 1k, it will continue scooping up file data. This
goes into the drive's memory cache, so the next request for 1k will return from
the memory cache. You generally cannot go wrong by reading files sequentially.
Almost all these memory caches (on the drive, in your memory, on your bus, and
inside your CPU but outside your actual ALU) use dedicated hardware to operate
asynchronously. The only thing better than a simulated thread is a real thread
in alternate hardware. You already have that in these caches.
Now, do you need to cross-reference these files, and alternate reads and writes
between distant points among them? That will cause thrashing - and if you must
synchronize these threads with semaphores then you will probably increase the
thrashing, unless you are a computer scientist who can determine the exact
algorithm required to keep every thread well-fed, without thread starvation.
Conclusion: Open each one, in order, process it sequentially, and close it. Then
profile your program, paying attention to user time, CPU time, and IO time. If
the IO time is very high, you are spending too much time waiting. If this
happens, you might consider breaking everything into threads, then sending all
the files simultaneously to your filesystem driver. It may have a function that
lets you batch up a whole bunch of file commands and simultaneously execute
them. This allows the harddrive to optimize its read operations, and multiplex
all the results together.
Don't do any of this unless you have a working program, _and_ you think its
slow, _AND_ your customers think it's slow. Premature optimization is the root
of all evil.
--
Phlip