[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.programming.threads

Splitting task does not gain speed on SMP machine, puzzled.

dsboily

7/18/2004 6:12:00 PM

I have made a program that trains neural networks. since i have a dual
Athlon MP machine i decided to split the task in two with pthreads.
here is how i do it,

in the normal serial case:

TrainingSet training_set(argv[1]);
for (int i=0; i<NB_NETWORKS; i++)
{
TrainingAlgo tga(&training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}

so i decided to created two threads, one to train the first half, the
other the second. here is how i implemented the two tasks:

extern "C"
void * Training1(void* data)
{
TrainingSet* T = reinterpret_cast<TrainingSet*>(data);
for (int i=0; i<NB_NETWORKS/2; i++)
{
TrainingAlgo tga(training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}
}

extern "C"
void * Training2(void* data)
{
TrainingSet* T = reinterpret_cast<TrainingSet*>(data);
for (int i=NB_NETWORKS/2; i<NB_NETWORKS; i++)
{
TrainingAlgo tga(training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}
}

and in the main, i put this:

TrainingSet training_set1(argv[1]);
TrainingSet training_set2(argv[1]); //so that the threads don's
compete for data
pthread_t Thread1, Thread2;
pthread_create(&Thread1, NULL, Training1, &training_set1);
pthread_create(&Thread2, NULL, Training2, &training_set2);
pthread_join(Thread1, NULL);
pthread_join(Thread2, NULL);

very straightforward stuff, nothing complicated. but there is no speed
gain. both versions run in about the same time.

serial -> time : 32.84 seconds
parallel -> task1 : 16.08 seconds
task2 : 16.45 seconds
total run time : 32.55 seconds

the serial version, as expected only takes 100% of cpu0 (or cpu1). the
parallel version takes 100% of both cpu0 and cpu1. so, anybody care to
take a guess as to why this is happening. there are no globals, the
two threads do not allocate/deallocate much. the bulk (98%) of
TrainingAlgo is spent multiplying matrices and vectors. i am puzzled,
am i missing something?

thanks
David
4 Answers

David Schwartz

7/18/2004 7:02:00 PM

0


"David Boily" <dsboily@fastmail.ca> wrote in message
news:e1840df3.0407181011.34495750@posting.google.com...

> serial -> time : 32.84 seconds
> parallel -> task1 : 16.08 seconds
> task2 : 16.45 seconds
> total run time : 32.55 seconds

Are these wall times? CPU times? Did you just add the 'task1' time to
the 'task2' time to get the total time? Do you realize that this makes no
sense?

DS


Mouse

7/18/2004 7:19:00 PM

0

David Boily

7/19/2004 3:00:00 AM

0

>> serial -> time : 32.84 seconds
>> parallel -> task1 : 16.08 seconds
>> task2 : 16.45 seconds
>> total run time : 32.55 seconds
>
> Are these wall times? CPU times? Did you just add the 'task1' time to
> the 'task2' time to get the total time? Do you realize that this makes no
> sense?
>
> DS

i did not add them up, it does make no sense. i just did this :

clock_gettime(CLOCK_REALTIME, start);
[task1_code]
clock_gettime(CLOCK_REALTIME, end);
task1_time = end - start; //essentially

same for task2, and

clock_gettime(CLOCK_REALTIME, start);
[pthread create and join code]
clock_gettime(CLOCK_REALTIME, end);
total_time = end - start; //essentially

but, i just timed it with a wind-up watch and the total time for the
parallel version is NOT 32 seconds, it is approx 17 seconds. i guess that
its adding up the child process times. after trying everything to fix this
in my program i find out that the damn profiler was wrong, the program
works just fine and scales perfectly. your comment about adding the task
times made me think that something in there was, so thanks.

David




velco

7/20/2004 9:44:00 AM

0

dsboily@fastmail.ca (David Boily) wrote in message news:<e1840df3.0407181011.34495750@posting.google.com>...
> I have made a program that trains neural networks. since i have a dual
> Athlon MP machine i decided to split the task in two with pthreads.
> here is how i do it,
[...]
> very straightforward stuff, nothing complicated. but there is no speed
> gain. both versions run in about the same time.
>
> serial -> time : 32.84 seconds
> parallel -> task1 : 16.08 seconds
> task2 : 16.45 seconds
> total run time : 32.55 seconds
>
> the serial version, as expected only takes 100% of cpu0 (or cpu1). the
> parallel version takes 100% of both cpu0 and cpu1. so, anybody care to
> take a guess as to why this is happening. there are no globals, the
> two threads do not allocate/deallocate much. the bulk (98%) of
> TrainingAlgo is spent multiplying matrices and vectors. i am puzzled,
> am i missing something?

Check whether both working sets fit in main memory.
Check whether both working sets fit in L2/L3 cache (if common).
Check whether the two working sets share cache lines.

~velco