Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.programming
About my parallel algorithms and NUMA
Ramine
2/18/2015 12:46:00 AM
Hello,
We have to be smart, so follow with me please..
As you have noticed i have implemented and invented a parallel Conjugate
gradient linear system solver library...
Here it is:
https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...
My parallel algorithm is scalable on NUMA architecture...
But You have to undertand my way of designing my NUMA-aware parallel
algorithms, the first way of implementing a NUMA-aware parallel
algorithm is by implementing a threadpool that schedules a job on a
given thread by specifying for example the NUMA-node explicitly
depending on the wich NUMA node's memory you will do your processing ...
this way will buy you 40% more throughput on NUMA architecture, but
there is another way of doing is to use the classical threadpool without
specifying the NUMA node explicitly , but you will divide for exemple
your parallel memory processing between the NUMA nodes, this is the way
i have implemented my parallel algorithms that are NUMA-aware, my way of
doing is scalable on NUMA architecture but you will get 40% less
throughput on NUMA architecture, but even if it's 40% throughput i think
that my parallel algorithms that are NUMA-aware are scalable on NUMA
architecture and they are still good enough, my next parallel sort
library will be also scalable on NUMA-architecture.
From were i have got this 40% ? please read here:
"Performance impact: the cost of NUMA remote memory access
For instance, this Dell whitepaper has some test results on the Xeon
5500 processors, showing that local memory access can have 40% higher
bandwidth than remote memory access, and the latency of local memory
access is around 70 nanoseconds whereas remote memory access has a
latency of about 100 nanoseconds."
Read more here:
http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-a...
Amine Moulay Ramdane.
1 Answer
Ramine
2/18/2015 12:58:00 AM
0
On 2/17/2015 4:45 PM, Ramine wrote:
>
> Hello,
>
> We have to be smart, so follow with me please..
>
> As you have noticed i have implemented and invented a parallel Conjugate
> gradient linear system solver library...
>
> Here it is:
>
>
https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...
>
>
> My parallel algorithm is scalable on NUMA architecture...
>
> But You have to undertand my way of designing my NUMA-aware parallel
> algorithms, the first way of implementing a NUMA-aware parallel
> algorithm is by implementing a threadpool that schedules a job on a
> given thread by specifying for example the NUMA-node explicitly
> depending on the wich NUMA node's memory you will do your processing ...
> this way will buy you 40% more throughput on NUMA architecture, but
> there is another way of doing is to use the classical threadpool without
> specifying the NUMA node explicitly , but you will divide for exemple
> your parallel memory processing between the NUMA nodes, this is the way
> i have implemented my parallel algorithms that are NUMA-aware, my way of
> doing is scalable on NUMA architecture but you will get 40% less
> throughput on NUMA architecture, but even if it's 40% throughput i think
I mean: even if it's 40% less throughput...
> that my parallel algorithms that are NUMA-aware are scalable on NUMA
> architecture and they are still good enough, my next parallel sort
> library will be also scalable on NUMA-architecture.
>
> From were i have got this 40% ? please read here:
>
>
> "Performance impact: the cost of NUMA remote memory access
>
> For instance, this Dell whitepaper has some test results on the Xeon
> 5500 processors, showing that local memory access can have 40% higher
> bandwidth than remote memory access, and the latency of local memory
> access is around 70 nanoseconds whereas remote memory access has a
> latency of about 100 nanoseconds."
>
> Read more here:
>
>
http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-a...
>
>
>
>
>
> Amine Moulay Ramdane.
>
>
>
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
About my parallel algorithms and NUMA
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password