Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.programming
About my parallel algorithm and NUMA...
Ramine
3/7/2015 5:49:00 AM
Hello...
We have to be smart, so follow with me please..
As you have noticed i have implemented and invented a parallel Conjugate
gradient linear system solver library...
Here it is:
https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-c...
My parallel algorithm is scalable on NUMA architecture...
But You have to undertand my way of designing my NUMA-aware parallel
algorithms, the first way of implementing a NUMA-aware parallel
algorithm is by implementing a threadpool that schedules a job on a
given thread by specifying for example the NUMA-node explicitly
depending on the wich NUMA node's memory you will do your processing ...
this way will buy you 40% more throughput on NUMA architecture, but
there is another way of doing is to use the classical threadpool without
specifying the NUMA node explicitly , but you will divide for example
your parallel memory processing between the NUMA nodes, this is the way
i have implemented my parallel algorithms that are NUMA-aware, my way of
doing is scalable on NUMA architecture but you will get 40% less
throughput on NUMA architecture, but even if it's 40% less throughput i
think that my parallel algorithms that are NUMA-aware are scalable on
NUMA architecture and they are still good enough, my next parallel sort
library will be also scalable on NUMA-architecture.
From were i have got this 40% less throughput ? please read here:
"Performance impact: the cost of NUMA remote memory access
For instance, this Dell whitepaper has some test results on the Xeon
5500 processors, showing that local memory access can have 40% higher
bandwidth than remote memory access, and the latency of local memory
access is around 70 nanoseconds whereas remote memory access has a
latency of about 100 nanoseconds."
Read more here:
http://sqlblog.com/blogs/linchi_shea/archive/2012/01/30/performance-impact-the-cost-of-numa-remote-memory-a...
As you have noticed on my NUMA-aware i am using my classical threadpool
and i am not scheduling the jobs by specifying an explicit NUMA node,
but i am dividing the parallel memory processing between the NUMA nodes,
and by doing so you will get a scalable algorithm with 40% less
throughput than if you design a more optimized parallel algorithm and
threadpool that schedules the jobs by specifying explicitly a NUMA node
so that to avoid at best remote memory accesses on NUMA nodes, and this
will get you 40% more throughput, bu my parallel algorithm that are
NUMA-aware and that uses a classical threadpool are also good i think,
and it is still good enough , but if you need me to optimize more my
threadpool so that to get 40% more throughput i will do it as a next
project.
Thank you,
Amine Moulay Ramdane.
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
About my parallel algorithm and NUMA...
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password