[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.programming

About my parallel algorithms and NUMA

Ramine

2/18/2015 12:46:00 AM


Hello,

We have to be smart, so follow with me please..

As you have noticed i have implemented and invented a parallel Conjugate
gradient linear system solver library...

Here it is:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...


My parallel algorithm is scalable on NUMA architecture...

But You have to undertand my way of designing my NUMA-aware parallel
algorithms, the first way of implementing a NUMA-aware parallel
algorithm is by implementing a threadpool that schedules a job on a
given thread by specifying for example the NUMA-node explicitly
depending on the wich NUMA node's memory you will do your processing ...
this way will buy you 40% more throughput on NUMA architecture, but
there is another way of doing is to use the classical threadpool without
specifying the NUMA node explicitly , but you will divide for exemple
your parallel memory processing between the NUMA nodes, this is the way
i have implemented my parallel algorithms that are NUMA-aware, my way of
doing is scalable on NUMA architecture but you will get 40% less
throughput on NUMA architecture, but even if it's 40% throughput i think
that my parallel algorithms that are NUMA-aware are scalable on NUMA
architecture and they are still good enough, my next parallel sort
library will be also scalable on NUMA-architecture.

From were i have got this 40% ? please read here:


"Performance impact: the cost of NUMA remote memory access

For instance, this Dell whitepaper has some test results on the Xeon
5500 processors, showing that local memory access can have 40% higher
bandwidth than remote memory access, and the latency of local memory
access is around 70 nanoseconds whereas remote memory access has a
latency of about 100 nanoseconds."

Read more here:

http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-a...




Amine Moulay Ramdane.



1 Answer

Ramine

2/18/2015 12:58:00 AM

0

On 2/17/2015 4:45 PM, Ramine wrote:
>
> Hello,
>
> We have to be smart, so follow with me please..
>
> As you have noticed i have implemented and invented a parallel Conjugate
> gradient linear system solver library...
>
> Here it is:
>
> https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...
>
>
> My parallel algorithm is scalable on NUMA architecture...
>
> But You have to undertand my way of designing my NUMA-aware parallel
> algorithms, the first way of implementing a NUMA-aware parallel
> algorithm is by implementing a threadpool that schedules a job on a
> given thread by specifying for example the NUMA-node explicitly
> depending on the wich NUMA node's memory you will do your processing ...
> this way will buy you 40% more throughput on NUMA architecture, but
> there is another way of doing is to use the classical threadpool without
> specifying the NUMA node explicitly , but you will divide for exemple
> your parallel memory processing between the NUMA nodes, this is the way
> i have implemented my parallel algorithms that are NUMA-aware, my way of
> doing is scalable on NUMA architecture but you will get 40% less
> throughput on NUMA architecture, but even if it's 40% throughput i think


I mean: even if it's 40% less throughput...


> that my parallel algorithms that are NUMA-aware are scalable on NUMA
> architecture and they are still good enough, my next parallel sort
> library will be also scalable on NUMA-architecture.
>
> From were i have got this 40% ? please read here:
>
>
> "Performance impact: the cost of NUMA remote memory access
>
> For instance, this Dell whitepaper has some test results on the Xeon
> 5500 processors, showing that local memory access can have 40% higher
> bandwidth than remote memory access, and the latency of local memory
> access is around 70 nanoseconds whereas remote memory access has a
> latency of about 100 nanoseconds."
>
> Read more here:
>
> http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-a...
>
>
>
>
>
> Amine Moulay Ramdane.
>
>
>