dsboily
7/18/2004 6:12:00 PM
I have made a program that trains neural networks. since i have a dual
Athlon MP machine i decided to split the task in two with pthreads.
here is how i do it,
in the normal serial case:
TrainingSet training_set(argv[1]);
for (int i=0; i<NB_NETWORKS; i++)
{
TrainingAlgo tga(&training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}
so i decided to created two threads, one to train the first half, the
other the second. here is how i implemented the two tasks:
extern "C"
void * Training1(void* data)
{
TrainingSet* T = reinterpret_cast<TrainingSet*>(data);
for (int i=0; i<NB_NETWORKS/2; i++)
{
TrainingAlgo tga(training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}
}
extern "C"
void * Training2(void* data)
{
TrainingSet* T = reinterpret_cast<TrainingSet*>(data);
for (int i=NB_NETWORKS/2; i<NB_NETWORKS; i++)
{
TrainingAlgo tga(training_set, NeuralNet[i], nb_epochs,
learning_rate);
tga.Start();
}
}
and in the main, i put this:
TrainingSet training_set1(argv[1]);
TrainingSet training_set2(argv[1]); //so that the threads don's
compete for data
pthread_t Thread1, Thread2;
pthread_create(&Thread1, NULL, Training1, &training_set1);
pthread_create(&Thread2, NULL, Training2, &training_set2);
pthread_join(Thread1, NULL);
pthread_join(Thread2, NULL);
very straightforward stuff, nothing complicated. but there is no speed
gain. both versions run in about the same time.
serial -> time : 32.84 seconds
parallel -> task1 : 16.08 seconds
task2 : 16.45 seconds
total run time : 32.55 seconds
the serial version, as expected only takes 100% of cpu0 (or cpu1). the
parallel version takes 100% of both cpu0 and cpu1. so, anybody care to
take a guess as to why this is happening. there are no globals, the
two threads do not allocate/deallocate much. the bulk (98%) of
TrainingAlgo is spent multiplying matrices and vectors. i am puzzled,
am i missing something?
thanks
David