Asp Forum - What about a PhD paper...

Ramine

10/31/2014 11:32:00 PM

Hello,

Frankly i think i have to write a PhD paper for my Parallel Sort library...

If you look at my Parallel Quickort you will notice that when the main
thread enters the partition() procedure for the first time before
starting the threads, it will have to serially partition "all" the
array, and this will make the serial part of the Amdahl equation bigger
, so it will make my Parallel Quicksort less scalable... hope this is
clear... but what about my Parallel Sort library ? we have to be smart,
when you take a look at the PhD papers, you will notice that they are
benchmarking there parallel algorithms empiricaly, but since i
don't have a computer with more than 4 cores, i can not benchmark
my Parallel Sort library with more than 4 cores, so i have to
do mathematical calculations to do a scalability prediction for
my Parallel Sort library, so if you take a look at the source
code of my Parallel Sort library, you will notice that
i am using 2 steps , one step for the sorting part and one
step for the merging part, and my parallel algorithm have completly
parallelized the sorting part and completly parallelized the merging
part and this has made my parallel algorithm more scalable cause its
serial part is much smaller than the serial part of my Parallel
Quicksort library, so how can we make a scalability prediction for my
Parallel Sort library ? you have to do it the same way as you do it in
assembler, you have to calculate the number of CPU clocks that the
parallel part of the Amdahl equation takes, and the number of CPU clocks
that the serial part of the Amdahl equation takes, i have done it for my
Parallel Quicksort inside my Parallel Sort library
, although this depends on the size of the "string" when you
are sorting strings, we will take as an example an average string
of 10 characters, so this will give us around 16 clocks for the parallel
part of the Amdahl equation for the compare() between strings
etc., cause i have done the calculation my self, and for the serial part
we have to know that the memory transfers are serialized in the x86
computers , and since on DDR3 the memory throughput can go up to 17 MB/s
and on my 2.4 Ghz CPU this will give 8 bytes memory transfers per clock
from the memoryto the CPU, so this will take around 2 clocks for
transfering two strings with an average size of 10 characters before
making the compare()... so finally this will give a serial part of 2
clocks and a parallel part of 16 clocks, so this will give a scalability
on multicores of 9x on x86 multicores with DDR3 memory, this was the
scalability prediction for "strings" with
an average size of 10 characters, for a strings with bigger size it
can scale more , but for integers sorting, two integers will be
transfers from the memory to the CPU in 1 CPU clock before doing the
compare(), and the compare() function between integers will take only
one clock, so for integers it will take 7 CPU clocks for the parallel
part and 1 CPU clock and one clock for the serial part , so after doing
the Amdahl equation calculation for integers it will give a scalability
on multicores to no more than around 8x for integers.

So my Parallel Sort library is much more scalable than my Parallel
Quicksort library...

This very good to know...

You can download my Parallel Sort library from:

https://sites.google.com/site/aminer68/parallel-so...

Thank you,
Amine Moulay Ramdane.

1 Answer

Ramine

10/31/2014 11:33:00 PM

I correct:

Hello...

Frankly i think i have to write a PhD paper for my Parallel Sort library...

If you look at my Parallel Quicksort you will notice that when the main
thread enters the partition() procedure for the first time before
starting the threads, it will have to serially partition "all" the
array, and this will make the serial part of the Amdahl equation bigger
, so it will make my Parallel Quicksort less scalable... hope this is
clear... but what about my Parallel Sort library ? we have to be smart,
when you take a look at the PhD papers, you will notice that they are
benchmarking there parallel algorithms empiricaly, but since i
don't have a computer with more than 4 cores, i can not benchmark
my Parallel Sort library with more than 4 cores, so i have to
do mathematical calculations to do a scalability prediction for
my Parallel Sort library, so if you take a look at the source
code of my Parallel Sort library, you will notice that
i am using 2 steps , one step for the sorting part and one
step for the merging part, and my parallel algorithm have completly
parallelized the sorting part and completly parallelized the merging
part and this has made my parallel algorithm more scalable cause its
serial part is much smaller than the serial part of my Parallel
Quicksort library, so how can we make a scalability prediction for my
Parallel Sort library ? you have to do it the same way as you do it in
assembler, you have to calculate the number of CPU clocks that the
parallel part of the Amdahl equation takes, and the number of CPU clocks
that the serial part of the Amdahl equation takes, i have done it for my
Parallel Quicksort inside my Parallel Sort library
, although this depends on the size of the "string" when you
are sorting strings, we will take as an example an average string
of 10 characters, so this will give us around 16 clocks for the parallel
part of the Amdahl equation for the compare() between strings
etc., cause i have done the calculation my self, and for the serial part
we have to know that the memory transfers are serialized in the x86
computers , and since on DDR3 the memory throughput can go up to 17 MB/s
and on my 2.4 Ghz CPU this will give 8 bytes memory transfers per clock
from the memoryto the CPU, so this will take around 2 clocks for
transfering two strings with an average size of 10 characters before
making the compare()... so finally this will give a serial part of 2
clocks and a parallel part of 16 clocks, so this will give a scalability
on multicores of 9x on x86 multicores with DDR3 memory, this was the
scalability prediction for "strings" with
an average size of 10 characters, for a strings with bigger size it
can scale more , but for integers sorting, two integers will be
transfers from the memory to the CPU in 1 CPU clock before doing the
compare(), and the compare() function between integers will take only
one clock, so for integers it will take 7 CPU clocks for the parallel
part and 1 CPU clock and one clock for the serial part , so after doing
the Amdahl equation calculation for integers it will give a scalability
on multicores to no more than around 8x for integers.

So my Parallel Sort library is much more scalable than my Parallel
Quicksort library...

This very good to know...

You can download my Parallel Sort library from:

https://sites.google.com/site/aminer68/parallel-so...

Thank you,
Amine Moulay Ramdane.

comp.programming

What about a PhD paper...

Ramine

Ramine

x Login to ForumsZone