Jano Svitok
3/24/2007 10:15:00 AM
On 3/23/07, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
> On Fri, 23 Mar 2007, Alex Ciarlillo wrote:
>
> > I have an application which is supposed to create a set of excel
> > documents to display sales information for dining areas. The core of it
> > is a class called SaleReport which is initialized with a sqlite database
> > connection. The main function takes an excel worksheet object, a row
> > number and some info about the location and queries the database to
> > populate the excel sheet. Since none of the locations rely on each
> > others data, I thought it would be pretty useful to thread that part, so
> > that each location report is run in its own thread. The problem is, even
> > though this seems to work, it has not improved performance at all and I
> > am not sure where the bottleneck is. Here are my theories and the
> > example code is at the bottom:
> >
> > 1) My first theory was that using a single excel application instance
> > was blocking the threads so that only one could have access at a time,
> > but now I rewrote it to use multiple excel instances and still no dice.
> >
> > 2) The connection to the database is limiting access to a single thread
> > at a time. This shouldnt be the case since each instance of the
> > SaleReport class gets it own connection, and SQLite is threadsafe.
>
> sqlite is threadsafe, but supports access by only one thread at a time. eg
> it's not concurrent at the c level. the only level of concurancy sqlite
> provides is at the process level.
>
> > 3) I'm flat out using the threads incorrectly.
>
> it's easy to do on windows - anything which blocks one thread as the os level
> will block all threads. this is suprisingly easy to do. your code looks
> fine. i'm not on windows, but if i were you i'd write some code that proves
> to myself that concurent access to an excel doc by threads does not end up
> blocking the whole process as i suspect it does. same goes for your
> SaleReport object.
I suppose that as well. As ruby threads are only interpreter threads,
I assume that each call to OLE blocks the entire interpreter.
Therefore it should not make any difference if you call it in threads
or not. Threads may be even slower, due to more overhead. You should
be able to check this by printing something to screen repeatedly in
one thread (remember setting $stdout.sync=true), and doing a long OLE
operation in another. The hyphothesis is that the printing will stop
while OLE is running.
To parallelize this you'd probably use more processes, either using
Win32::Process from win32utils or manually spawning some worker
processes and communicating with the main process using drb or
similar.