Asp Forum - are threads a solution to my problem?

rwa

8/8/2004 6:35:00 PM

I have a possibly unusual problem that I am beginning to think may be
solved by some kind of threading technique.

What I will describe is actually a highly simplified version of the
real problem, but nevertheless includes the important parts that I
need help solving.

I have a "legacy" code that processes data. It is extremely large,
complex, and costly. Large scale rewriting is not an option. Some
modifications are acceptable - the fewer the better.

This code needs to modified or used in some way to process data in
chunks which form a collective whole. The "interesting" part of the
problem is that the chunks cannot be processed entirely independently.
If the processing consists of phases A, B, and C, all chunks 1-N must
be processed through phase A before phase B can begin, etc. These
"data dependency" points in the program are known.

What I am envisioning is multiple threads of the legacy code being
launched by an outer "scheduler" program (excuse my potential abuse of
the precise terminology associated with this area). The data
dependency points in the legacy program (where all data must be
processed before continuing) have some kind of "yield" instructions
inserted, and the outer program simply steps through each chunk of
data, switching contexts, before resuming processing in each thread
once all threads have completed an independent phase of the
processing.

user-space threads sound like they solve my problem here. All of the
examples I can find about using user-space threads seem to be for
solving related but fundamentally different problems, in that the
scheduling of the threads is handled "behind the scenes" or driven by
external events of some kind.

What I want to do is explicitly manage the
initiation/suspension/resumption of the threads I create (the
algorithm for stepping through them being extremely simple), and have
the threads yield at predefined points in their execution.

Hopefully someone can confirm that threading sounds like the right
solution to this problem, and furthermore, point me to an appropriate
threading library, or maybe example code that does something similar
in spirit.

Thanks in advance.

Bob

17 Answers

Dave Butenhof

8/9/2004 11:56:00 AM

R Anderson wrote:

> I have a possibly unusual problem that I am beginning to think may be
> solved by some kind of threading technique.
>
> What I will describe is actually a highly simplified version of the
> real problem, but nevertheless includes the important parts that I
> need help solving.
>
> I have a "legacy" code that processes data. It is extremely large,
> complex, and costly. Large scale rewriting is not an option. Some
> modifications are acceptable - the fewer the better.
>
> This code needs to modified or used in some way to process data in
> chunks which form a collective whole. The "interesting" part of the
> problem is that the chunks cannot be processed entirely independently.
> If the processing consists of phases A, B, and C, all chunks 1-N must
> be processed through phase A before phase B can begin, etc. These
> "data dependency" points in the program are known.
>
> What I am envisioning is multiple threads of the legacy code being
> launched by an outer "scheduler" program (excuse my potential abuse of
> the precise terminology associated with this area). The data
> dependency points in the legacy program (where all data must be
> processed before continuing) have some kind of "yield" instructions
> inserted, and the outer program simply steps through each chunk of
> data, switching contexts, before resuming processing in each thread
> once all threads have completed an independent phase of the
> processing.

Let's see; a sequence of "chunks" of information that you want to be
processed "independently", but that must be processed sequentially.

I don't mean to seem sarcastic, but perhaps what you really want is not so
much a thread as a subroutine call?

Really, a thread is essentially just an asynchronous subroutine call. You
don't WANT the "asynchronous" part. You'll just waste time by creating
threads and forcing them to execute serially. So stick with the subroutine
call part...

> user-space threads sound like they solve my problem here. All of the
> examples I can find about using user-space threads seem to be for
> solving related but fundamentally different problems, in that the
> scheduling of the threads is handled "behind the scenes" or driven by
> external events of some kind.
>
> What I want to do is explicitly manage the
> initiation/suspension/resumption of the threads I create (the
> algorithm for stepping through them being extremely simple), and have
> the threads yield at predefined points in their execution.

There are co-routine packages that allow you to do this. They're not
"threads" in any useful sense. It's possible, but not trivial, (and highly
inefficient) to force modern thread packages to behave as co-routine
schedulers. I wouldn't recommend it.

> Hopefully someone can confirm that threading sounds like the right
> solution to this problem, and furthermore, point me to an appropriate
> threading library, or maybe example code that does something similar
> in spirit.

If you mention anywhere what operating system or hardware you're using, I
missed it. Most modern operating systems come with builtin threading
libraries. The best are based on the international POSIX standard; but
you'll find some (e.g. Windows) with their own nonportable interfaces.

You can probably accomplish what you think you want with any of them, but I
doubt you'll be happy with the result. I think it wouldn't be hard to work
out a more efficient way to address your complexity issue...

--
/--------------------[ David.Butenhof@hp.com ]--------------------| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-20... |
\----[ http://homepage.mac.com/dbutenhof/Threads/Th... ]---/

rwa

8/9/2004 11:04:00 PM

David Butenhof <David.Butenhof@hp.com> wrote in message news:<411766d3@usenet01.boi.hp.com>...
> R Anderson wrote:
>
> > I have a possibly unusual problem that I am beginning to think may be
> > solved by some kind of threading technique.
> >
> > What I will describe is actually a highly simplified version of the
> > real problem, but nevertheless includes the important parts that I
> > need help solving.
> >
> > I have a "legacy" code that processes data. It is extremely large,
> > complex, and costly. Large scale rewriting is not an option. Some
> > modifications are acceptable - the fewer the better.
> >
> > This code needs to modified or used in some way to process data in
> > chunks which form a collective whole. The "interesting" part of the
> > problem is that the chunks cannot be processed entirely independently.
> > If the processing consists of phases A, B, and C, all chunks 1-N must
> > be processed through phase A before phase B can begin, etc. These
> > "data dependency" points in the program are known.
> >
> > What I am envisioning is multiple threads of the legacy code being
> > launched by an outer "scheduler" program (excuse my potential abuse of
> > the precise terminology associated with this area). The data
> > dependency points in the legacy program (where all data must be
> > processed before continuing) have some kind of "yield" instructions
> > inserted, and the outer program simply steps through each chunk of
> > data, switching contexts, before resuming processing in each thread
> > once all threads have completed an independent phase of the
> > processing.
>
> Let's see; a sequence of "chunks" of information that you want to be
> processed "independently", but that must be processed sequentially.

I don't think you were reading carefully. What I said was that the
processing was _not_ entirely independent. And I don't think I'd
describe the processing as "sequential." It is not possible to
process through an entire subroutine, but it is necessary to
interleave the execution of the subroutines. This is the crux of the
issue.

> I don't mean to seem sarcastic, but perhaps what you really want is not so
> much a thread as a subroutine call?

No. The stack-based nature of subroutines do not solve the problem
for multiple points of dependency. Of that I am pretty certain. Try
it for yourself.

Note that the subroutines that do the processing are already written.
They were not written with the intention that the information was to
be processed in chunks, and therefore their structure does not take
into account the places in the algorithm where the phases of the
calculation are not independent from the other chunks - because there
are no other chunks in the original design. All of the data was
always available in the original implementation. And we're talking
about not really "a subroutine" but 100k+ lines of code in subroutines
reaching many calls deep, with the data dependency points existing at
many different depths in the call chain.

> Really, a thread is essentially just an asynchronous subroutine call. You
> don't WANT the "asynchronous" part.

It is not the asynchronous part that I am after, but more the ability
to do "context switching" to and from a subroutine in ways that are
not possible with stack-based subroutines. In fact, I think the
traditional CS term for what I need is "coroutines." Basically I am
trying to get coroutine functionality using threads.

> You'll just waste time by creating
> threads and forcing them to execute serially. So stick with the subroutine
> call part...

No. For the reasons already mentioned.

> > user-space threads sound like they solve my problem here. All of the
> > examples I can find about using user-space threads seem to be for
> > solving related but fundamentally different problems, in that the
> > scheduling of the threads is handled "behind the scenes" or driven by
> > external events of some kind.
> >
> > What I want to do is explicitly manage the
> > initiation/suspension/resumption of the threads I create (the
> > algorithm for stepping through them being extremely simple), and have
> > the threads yield at predefined points in their execution.
>
> There are co-routine packages that allow you to do this. They're not
> "threads" in any useful sense.

The key concepts are maintaining concurrent, independent program
counters and stacks. Correct me if I am mistaken but those are
concepts common to threads.

> It's possible, but not trivial, (and highly
> inefficient) to force modern thread packages to behave as co-routine
> schedulers. I wouldn't recommend it.

Ok. Why not?

I am not sure what you consider a "modern thread package" but what I
am using so far is the "GNU portable threads" library. It seems to
provide the functionality I need (coroutines) by the mechanism
described in this paper:

http://www.usenix.org/publications/library/proceedings/usenix2000/general/engels...

Why would you consider this unsuitable?

> I think it wouldn't be hard to work
> out a more efficient way to address your complexity issue...

I am quite certain you're mistaken on that point. But how could you
know - I didn't describe the full scope of the problem. Only the
important bits.

Thanks,
Bob

Joe Seigh

8/9/2004 11:22:00 PM

R Anderson wrote:
>
> I am not sure what you consider a "modern thread package" but what I
> am using so far is the "GNU portable threads" library. It seems to
> provide the functionality I need (coroutines) by the mechanism
> described in this paper:
>
> http://www.usenix.org/publications/library/proceedings/usenix2000/general/engels...
>
> Why would you consider this unsuitable?

Switching stacks isn't really supported or guaranteed to work anymore. You can't even
use setcontext() to do it.

Use of contexts to create alternate stacks is not
defined by this volume of IEEE Std 1003.1-2001.

You'd be depending on undefined behavior.

The GNU portable threads library isn't as modern or portable as you seem to think it is.

Joe Seigh

ptjm

8/10/2004 12:27:00 AM

In article <c75bc946.0408091504.47951ae9@posting.google.com>,
R Anderson <rwa@alumni.princeton.edu> wrote:

% It is not the asynchronous part that I am after, but more the ability
% to do "context switching" to and from a subroutine in ways that are

But why? It sounds like you can read a chunk, process it, read the
next chunk, process it, and so on.

It sounds like you've made up your mind, but if it someone were to dump
100k+ lines of convoluted code which assumes all the data is available
all the time on me, introducing threads would not be near the top of my
list of things to do with it, even if it seemed like a reasonable way
to solve a problem. And if the problem is amenable to using GNU pth,
then threading isn't a reasonable way to solve the problem.

--

Patrick TJ McPhee
East York Canada
ptjm@interlog.com

rwa

8/10/2004 3:04:00 PM

Joe Seigh <jseigh_01@xemaps.com> wrote in message news:<41180955.2BCE229E@xemaps.com>...
> R Anderson wrote:
> >
> > I am not sure what you consider a "modern thread package" but what I
> > am using so far is the "GNU portable threads" library. It seems to
> > provide the functionality I need (coroutines) by the mechanism
> > described in this paper:
> >
> > http://www.usenix.org/publications/library/proceedings/usenix2000/general/engels...
> >
> > Why would you consider this unsuitable?
>
> Switching stacks isn't really supported or guaranteed to work anymore. You can't even
> use setcontext() to do it.
>
> Use of contexts to create alternate stacks is not
> defined by this volume of IEEE Std 1003.1-2001.
>
> You'd be depending on undefined behavior.
>
> The GNU portable threads library isn't as modern or portable as you seem to think it is.
>
> Joe Seigh

Well actually I hadn't offered any opinion about how modern or
portable it was.

I don't care about "modern," whatever that means. But I do care about
portable. Empirically, the mechanism appears to work on the 5 or 6
different unix variations I have tried it on. If you know
specifically of some unix on which it does not work, I'd like to know
about that.

Perhaps you could offer another alternative approach which offers the
desired functionality?

Bob

rwa

8/10/2004 3:31:00 PM

ptjm@interlog.com (Patrick TJ McPhee) wrote in message news:<cf94on$1g$1@news.eusc.inter.net>...
> In article <c75bc946.0408091504.47951ae9@posting.google.com>,
> R Anderson <rwa@alumni.princeton.edu> wrote:
>
> % It is not the asynchronous part that I am after, but more the ability
> % to do "context switching" to and from a subroutine in ways that are
>
> But why? It sounds like you can read a chunk, process it, read the
> next chunk, process it, and so on.

You cannot. I've described why twice, but it appears hard to
understand, so I'll describe an analogy one more time so that people
can "get it."

Consider a function f() which works on an array x of doubles. What it
does is add 1 to each element, and then it averages nearest neighbors.
It does this several times in a loop, like this:

f() {
for n 1..10 {
for each i in range, x[i] += 1;
for each i in range, x[i] = avg(x[i-1],x[i],x[i+1]);
}
}

In the original implementation, the range is always all of the data in
x (plus boundary values). In the new implementation, the range is a
subset of the data - a "window" on x, and x points to the current
window for processing.

Now, clearly if you do something like this:

main()
for each subrange r {
point x to current subrange
set range to r
f()
}
}

the new program produces different results. The reason is that the
boundary values in the average operation are working on data which
hasn't been updated yet by the increment operation. This is the
"phased" or "data dependency" concept of the data processing that I
talked about in the initial post. There are phases A,B,C, etc. which
are internally independent, but _each phase cannot proceed to the next
until all data chunks have been processed in the prior phase_.

Now I'm sure people will post that I need to just split up function
f(). Well, that would be great, but it is essentially impossible.
f() is extraordinarily complex, with tons of internal stack-based
state. The reason the coroutine concept solves this problem so
beautifully is because that state is saved in a completely robust,
automated way. You can even continue to modify f(), and as long as
you synchronize the data dependency points to the coroutine yield()
points, it will all continue to "just work".

The coroutine idiom that solves this problem in general is this:

original code:
f() {
phase A
phase B
phase C
}

modified code:
f() {
phase A
yield(main)
phase B
yield(main)
phase C
}

new chunk-based driver:
main() {
for each data chunk i {
spawn i'th f()
}
for each data chunk i {
yield(i)
}
for each data chunk i {
yield(i)
}
}

Remember, f() is very complex, with lots of internal stack-based
state. If you can think of a better solution for the problem which
requires less modification to f(), I'd love to hear about it.

> It sounds like you've made up your mind, but if it someone were to dump
> 100k+ lines of convoluted code which assumes all the data is available
> all the time on me, introducing threads would not be near the top of my
> list of things to do with it, even if it seemed like a reasonable way
> to solve a problem.

You have a better solution? I'd love to learn of one.

> And if the problem is amenable to using GNU pth,
> then threading isn't a reasonable way to solve the problem.

I'd be interested in hearing the reasoning behind that assertion.

Thanks,
Bob

Joe Seigh

8/10/2004 3:57:00 PM

R Anderson wrote:
>
> Joe Seigh <jseigh_01@xemaps.com> wrote in message news:<41180955.2BCE229E@xemaps.com>...
> > Switching stacks isn't really supported or guaranteed to work anymore. You can't even
> > use setcontext() to do it.
> >
> > Use of contexts to create alternate stacks is not
> > defined by this volume of IEEE Std 1003.1-2001.
> >
> > You'd be depending on undefined behavior.
> >
> > The GNU portable threads library isn't as modern or portable as you seem to think it is.
> >
> > Joe Seigh
>
> Well actually I hadn't offered any opinion about how modern or
> portable it was.
>
> I don't care about "modern," whatever that means. But I do care about
> portable. Empirically, the mechanism appears to work on the 5 or 6
> different unix variations I have tried it on. If you know
> specifically of some unix on which it does not work, I'd like to know
> about that.

Before threading got supported in the various kernels, threads were implemented
in userspace by switching stacks among other things, some of them platform
specific. When the kernel implementations were done, they may have changed
the user environment that would break some of the older user level thread
implementations.

The fact that the warning about alternate stacks exists would indicate that
at least one platform has problems if you switch stacks.

>
> Perhaps you could offer another alternative approach which offers the
> desired functionality?
>
You'd have to simulate coroutines using multiple threads and condition variables.

Joe Seigh

steve

8/10/2004 4:52:00 PM

In article <c75bc946.0408100731.6de5713c@posting.google.com>,
R Anderson <rwa@alumni.princeton.edu> wrote:
>ptjm@interlog.com (Patrick TJ McPhee) wrote in message
>news:<cf94on$1g$1@news.eusc.inter.net>...
>> In article <c75bc946.0408091504.47951ae9@posting.google.com>,
>> R Anderson <rwa@alumni.princeton.edu> wrote:
>>
>> % It is not the asynchronous part that I am after, but more the ability
>> % to do "context switching" to and from a subroutine in ways that are
>>
>> But why? It sounds like you can read a chunk, process it, read the
>> next chunk, process it, and so on.
>
>You cannot. I've described why twice, but it appears hard to
>understand, so I'll describe an analogy one more time so that people
>can "get it."
>
>Consider a function f() which works on an array x of doubles. What it

[ classic image processing function description ]

>> It sounds like you've made up your mind, but if it someone were to dump
>> 100k+ lines of convoluted code which assumes all the data is available
>> all the time on me, introducing threads would not be near the top of my
>> list of things to do with it, even if it seemed like a reasonable way
>> to solve a problem.
>
>You have a better solution? I'd love to learn of one.

OK, let's step back a bit. What are you hoping to accomplish by adding
threads? Your problem (as described) sounds quite CPU bound. You
mentioned that it's runinng on a uniprocessor, so I'll got out on a
(fairly sturdy) limb and say it'll become slower on a uniprocessor if you
add thread overhead of any sort.

>> And if the problem is amenable to using GNU pth,
>> then threading isn't a reasonable way to solve the problem.
>
>I'd be interested in hearing the reasoning behind that assertion.

If the goal is to make the operation faster on a multiprocessor, then
GNU pth won't help, because it doesn't use the system's thread scheduler.
You need multiple processes or multiple system threads for that.

There are well-known ways to handle the data dependency problem
while parallelizing blocks of work. See, for example, Butenhof's
"Programming with POSIX Threads" early chapters.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9"
Internet: steve @ Watt.COM Whois: SW32
Free time? There's no such thing. It just comes in varying prices...

rwa

8/10/2004 10:05:00 PM

steve@nospam.Watt.COM (Steve Watt) wrote in message news:<I28pIr.rup@Watt.COM>...
> In article <c75bc946.0408100731.6de5713c@posting.google.com>,
> R Anderson <rwa@alumni.princeton.edu> wrote:
> >ptjm@interlog.com (Patrick TJ McPhee) wrote in message
> >news:<cf94on$1g$1@news.eusc.inter.net>...
> >> In article <c75bc946.0408091504.47951ae9@posting.google.com>,
> >> R Anderson <rwa@alumni.princeton.edu> wrote:
> >>
> >> % It is not the asynchronous part that I am after, but more the ability
> >> % to do "context switching" to and from a subroutine in ways that are
> >>
> >> But why? It sounds like you can read a chunk, process it, read the
> >> next chunk, process it, and so on.
> >
> >You cannot. I've described why twice, but it appears hard to
> >understand, so I'll describe an analogy one more time so that people
> >can "get it."
> >
> >Consider a function f() which works on an array x of doubles. What it
>
> [ classic image processing function description ]
>
> >> It sounds like you've made up your mind, but if it someone were to dump
> >> 100k+ lines of convoluted code which assumes all the data is available
> >> all the time on me, introducing threads would not be near the top of my
> >> list of things to do with it, even if it seemed like a reasonable way
> >> to solve a problem.
> >
> >You have a better solution? I'd love to learn of one.
>
> OK, let's step back a bit. What are you hoping to accomplish by adding
> threads?

I attempted to describe this in the original post, but I'll repeat it
here:

The problem is this: there is an existing code base which does the
type of processing I described. When this code was written, a design
assumption was that all data was available and processed in a single
pass. Over a long period of time, requirements changed, and it is now
desired that the operations that this program implements can be done
on subsets of the data at a time. (The reasons for this are very
involved, but please do me the courtesy of accepting that this is a
reasonable thing to do. It boils down to the fact that the datasets
in question are no longer static but are dynamic, and we are coupling
to auxiliary tools which decompose the dynamic data into these
"chunks" I am talking about, which then get distributed over a
parallel machine. However, often it is the case that several chunks
are assigned to one node for processing. Anyway...)

The existing code base "almost" solves this problem - it is
tantalizingly close. It solves the problem modulo a known and
reasonable number of points in the algorithm where "neighbor" data
must have been processed to the same point in the algorithm, before
the algorithm can proceed from there.

What I am hoping to accomplish is a minimal transformation of the
existing code base by turning the original processing function "f"
into a family of coroutines. What coroutines allow is for concurrent
"instances" of the algorithm to be run, so that instance 1 can be run
to an arbitrary place in the algorithm, "halted", and instance 2 can
be launched, etc, until all instances have executed to the first halt
point. Then, and only then, is the data in a state for which the
algorithm can proceed to the next step.

I am posting in a thread newsgroup because it seems to me that
coroutines and threads are closely related - one might even use
threads to simulate coroutines. But it is really coroutines I am
after.

> Your problem (as described) sounds quite CPU bound.

Certainly.

> You
> mentioned that it's runinng on a uniprocessor, so I'll got out on a
> (fairly sturdy) limb and say it'll become slower on a uniprocessor if you
> add thread overhead of any sort.

Sure. And I don't care. The coroutine direction has nothing to do
with performance (although certainly I would like to get the best
performance I can from it.). It has everything to do with using an
existing code base to solve a certain problem in the most effective
way. It is a software engineering device and nothing more.

> >> And if the problem is amenable to using GNU pth,
> >> then threading isn't a reasonable way to solve the problem.
> >
> >I'd be interested in hearing the reasoning behind that assertion.
>
> If the goal is to make the operation faster on a multiprocessor, then
> GNU pth won't help, because it doesn't use the system's thread scheduler.
> You need multiple processes or multiple system threads for that.

It isn't.

> There are well-known ways to handle the data dependency problem
> while parallelizing blocks of work. See, for example, Butenhof's
> "Programming with POSIX Threads" early chapters.

I skimmed the book in question. The problem with full-blown threads
is that, as far as I have been able to gather - the scheduling is
implicit. I do not want implicit scheduling. I want fully explicit
scheduling. I want to have a main coroutine which essentially
multiplexes control to coroutines 1..N in turn. coroutines 1..N work
until they reach a point of required synchronization, at which point
they explicitly return control to the main coroutine, and so on.

Thanks,
Bob

ptjm

8/11/2004 1:05:00 AM

In article <c75bc946.0408100731.6de5713c@posting.google.com>,
R Anderson <rwa@alumni.princeton.edu> wrote:
% ptjm@interlog.com (Patrick TJ McPhee) wrote in message
% news:<cf94on$1g$1@news.eusc.inter.net>...

% > But why? It sounds like you can read a chunk, process it, read the
% > next chunk, process it, and so on.
%
% You cannot. I've described why twice, but it appears hard to
% understand, so I'll describe an analogy one more time so that people
% can "get it."

It's not so much that it's hard to understand as that your description
is vague and lengthy and nobody can be bothered to pay too much
attention to it because nobody particularly cares.

% > It sounds like you've made up your mind, but if it someone were to dump
% > 100k+ lines of convoluted code which assumes all the data is available
% > all the time on me, introducing threads would not be near the top of my
% > list of things to do with it, even if it seemed like a reasonable way
% > to solve a problem.
%
% You have a better solution? I'd love to learn of one.

I didn't say that I have a solution at all. I don't know what your
problem is, and as I said I don't particularly care. From your
description, it sounds like introducing multiple threads of execution
will be problematic. Perhaps you've described the situation poorly,
however it seems that you've come to the same conclusion, so perhaps
not.

% > And if the problem is amenable to using GNU pth,
% > then threading isn't a reasonable way to solve the problem.
%
% I'd be interested in hearing the reasoning behind that assertion.

Because GNU pth isn't a threading package. You seem to believe
that the solution to your problem is coroutines, and GNU pth might
be an OK coroutine package, but coroutines are dramatically different
from threads, and as you yourself have noted, they provide different
advantages.
--

Patrick TJ McPhee
East York Canada
ptjm@interlog.com

comp.programming.threads

are threads a solution to my problem?

rwa

Dave Butenhof

rwa

Joe Seigh

ptjm

rwa

rwa

Joe Seigh

steve

rwa

ptjm

x Login to ForumsZone