rwa
8/10/2004 10:05:00 PM
steve@nospam.Watt.COM (Steve Watt) wrote in message news:<I28pIr.rup@Watt.COM>...
> In article <c75bc946.0408100731.6de5713c@posting.google.com>,
> R Anderson <rwa@alumni.princeton.edu> wrote:
> >ptjm@interlog.com (Patrick TJ McPhee) wrote in message
> >news:<cf94on$1g$1@news.eusc.inter.net>...
> >> In article <c75bc946.0408091504.47951ae9@posting.google.com>,
> >> R Anderson <rwa@alumni.princeton.edu> wrote:
> >>
> >> % It is not the asynchronous part that I am after, but more the ability
> >> % to do "context switching" to and from a subroutine in ways that are
> >>
> >> But why? It sounds like you can read a chunk, process it, read the
> >> next chunk, process it, and so on.
> >
> >You cannot. I've described why twice, but it appears hard to
> >understand, so I'll describe an analogy one more time so that people
> >can "get it."
> >
> >Consider a function f() which works on an array x of doubles. What it
>
> [ classic image processing function description ]
>
> >> It sounds like you've made up your mind, but if it someone were to dump
> >> 100k+ lines of convoluted code which assumes all the data is available
> >> all the time on me, introducing threads would not be near the top of my
> >> list of things to do with it, even if it seemed like a reasonable way
> >> to solve a problem.
> >
> >You have a better solution? I'd love to learn of one.
>
> OK, let's step back a bit. What are you hoping to accomplish by adding
> threads?
I attempted to describe this in the original post, but I'll repeat it
here:
The problem is this: there is an existing code base which does the
type of processing I described. When this code was written, a design
assumption was that all data was available and processed in a single
pass. Over a long period of time, requirements changed, and it is now
desired that the operations that this program implements can be done
on subsets of the data at a time. (The reasons for this are very
involved, but please do me the courtesy of accepting that this is a
reasonable thing to do. It boils down to the fact that the datasets
in question are no longer static but are dynamic, and we are coupling
to auxiliary tools which decompose the dynamic data into these
"chunks" I am talking about, which then get distributed over a
parallel machine. However, often it is the case that several chunks
are assigned to one node for processing. Anyway...)
The existing code base "almost" solves this problem - it is
tantalizingly close. It solves the problem modulo a known and
reasonable number of points in the algorithm where "neighbor" data
must have been processed to the same point in the algorithm, before
the algorithm can proceed from there.
What I am hoping to accomplish is a minimal transformation of the
existing code base by turning the original processing function "f"
into a family of coroutines. What coroutines allow is for concurrent
"instances" of the algorithm to be run, so that instance 1 can be run
to an arbitrary place in the algorithm, "halted", and instance 2 can
be launched, etc, until all instances have executed to the first halt
point. Then, and only then, is the data in a state for which the
algorithm can proceed to the next step.
I am posting in a thread newsgroup because it seems to me that
coroutines and threads are closely related - one might even use
threads to simulate coroutines. But it is really coroutines I am
after.
> Your problem (as described) sounds quite CPU bound.
Certainly.
> You
> mentioned that it's runinng on a uniprocessor, so I'll got out on a
> (fairly sturdy) limb and say it'll become slower on a uniprocessor if you
> add thread overhead of any sort.
Sure. And I don't care. The coroutine direction has nothing to do
with performance (although certainly I would like to get the best
performance I can from it.). It has everything to do with using an
existing code base to solve a certain problem in the most effective
way. It is a software engineering device and nothing more.
> >> And if the problem is amenable to using GNU pth,
> >> then threading isn't a reasonable way to solve the problem.
> >
> >I'd be interested in hearing the reasoning behind that assertion.
>
> If the goal is to make the operation faster on a multiprocessor, then
> GNU pth won't help, because it doesn't use the system's thread scheduler.
> You need multiple processes or multiple system threads for that.
It isn't.
> There are well-known ways to handle the data dependency problem
> while parallelizing blocks of work. See, for example, Butenhof's
> "Programming with POSIX Threads" early chapters.
I skimmed the book in question. The problem with full-blown threads
is that, as far as I have been able to gather - the scheduling is
implicit. I do not want implicit scheduling. I want fully explicit
scheduling. I want to have a main coroutine which essentially
multiplexes control to coroutines 1..N in turn. coroutines 1..N work
until they reach a point of required synchronization, at which point
they explicitly return control to the main coroutine, and so on.
Thanks,
Bob