Asp Forum - Efficiency of my parallel algorithm

Ramine

1/13/2015 10:33:00 PM

Hello,

I have spook before about my Parallel Conjugate gradient linear system
solver library and i have said that it is using a probabilistic
mechanism that is very efficient and that render my parallel algorithm
into a scalable parallel algorithm on NUMA architecture, so now i will
prove to you what i am saying:

If you look at my parallel algorithm it is dividing the array by 250
elements, and if you look carefully i am using two functions that
consumes the greater part of all the CPU, it is the atsub() and asub(),
and inside those functions i am using a probabilistic mechanism so that
to render my algorithm scalable on NUMA architecture, what i am
doing is scrambling the array parts using a probabilistic function
and what i have noticed that this probabilitic mechanism is very
efficient, to proove to you what i am saying , please look at the
following simulation that i have done using a variable that
contains the number of NUMA nodes, and what i have noticed that my
simulation is giving almost a perfect scalability on NUMA architecture,
for example let us give to the "NUMA_nodes" variable a value of 4,
and to our array a value of 250, since i am dividing all the array by
250, the simulation will give a number of contention points of a
quarter of the array, so if i am using 16 cores , in the the worst case
it will scale 4X throughput on NUMA architecture, because since
i am using an array of 250 and there is a quarter of the array of
contention points , so from the Amdahl'law this will give a scalability
of 4X throughput on four NUMA nodes, and this will give almost a perfect
scalability on more and more NUMA nodes, so my parallel algorithm is
scalable on NUMA architecture,

Here the simulation that i have done, please run it and you will notice
yourself that my parallel algorithm is scalable on NUMA architecture.

Here it is:

---
program test;

uses math;

var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin

a:=250;

Numa_nodes:=4;

setlength(tab2,a);

for i:=0 to a-1
do
begin

tab2[i]:=i mod numa_nodes;

end;

setlength(tab,a);

randomize;

for k:=0 to a-1
do tab[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab[k];
tab[k]:=tab[n1];
tab[n1]:=tmp;
end;

setlength(tab1,a);

randomize;

for k:=0 to a-1
do tab1[k]:=k;

n2:=a-1;

for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1[k];
tab1[k]:=tab1[n1];
tab1[n1]:=tmp;
end;

for i:=0 to a-1
do
if tab2[tab[i]]=tab2[tab1[i]] then
begin
inc(j);
writeln('A contention at: ',i);

end;

writeln('Number of contention points: ',j);

setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);

end.

---

You can download my Parallel Conjugate gradient solver library from:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...

Thank you for your time.

Amine Moulay Ramdane.

comp.programming

Efficiency of my parallel algorithm

Ramine

x Login to ForumsZone