Ramine
11/19/2014 3:10:00 AM
Hello,
Please read the PhD paper, it says about the benchmarks that scales
to 6x that:
"The graph shows the average throughput in terms of
number of critical and non-critical section pairs executed per second.
The critical section accesses two distinct cache blocks (increments
4 integer counters on each block), and the non-critical section
is an idle spin loop of up to 4 microseconds."
So as i have just explained to you, the serial part inside the critical
section takes around 1 clock and the parallel part inside the function
that enters the local locks takes around 6 clocks , this is why it
gives 6x scalability from the calculation results of the Amdahl's law.
Hope you have understood well what i want to say, that the scalability
of 6x is not the result of the minimization at best of the inter-socket
coherence traffic, but it is the result of the parallel part inside
the function that enters the local locks and the serial part
inside the critical section of the lock cohort, this is what the
PhD paper doesn't explain to you, and also you have to know that
if you are transfering more than 4 bytes from the L2(local or
remote) to the CPU, the Lock cohort will scale less and less
than 6x , this is why my scalable MLock is still useful
and my scalable MLock can be used in realtime critical systems,
the Lock cohort can not cause it is unfair.
Thank you,
Amine Moulay Ramdane,.