Asp Forum - Scalable SeqlockX is here...

Ramine

3/7/2015 4:02:00 AM

Hello,

Scalable SeqlockX version 1.0

Author: Amine Moulay Ramdane.

Description:

This scalable Seqlock's variant was invented by Amine Moulay Ramdane,
it is much faster and more powerfull than the the Dmitry Vyukov's
distributed reader-writer mutex , cause the Dmitry Vyukov's distributed
reader-writer lock will become slower and slower on the writer side with
more and more cores because it transfers too many cache-lines between
cores on the writer side, this invention has eliminated the weakness of
the Dmitry Vyukov's distributed reader-writer mutex, so that the
writers throughput has become faster and very fast, and my scalable
SeqlockX elminates the weaknesses of the classical Seqlock (sequential
lock) that is "livelock", so it avoids livelock when there is more
writers, so my scalable SeqlockX is a powerful sychronization mechanism
that can replace RCU and that can replace Seqlock and that can replace
Dmitry Vyukov's distributed reader-writer mutex. And if you look at my
algorithm you will notice that on the reader side it looks like a
classical sequential lock, but i have added a variable called
fcount2^.fcount2 that allows the readers to not livelock when there is
more writers ,this scalable SeqlockX is faster and more scalable than
the Dmitry Vyukov's distributed reader-writer mutex and it beats the
classical Seqlock because it avoids livelock when there is many writers.

Please look at the "test.pas" example that i have included and you will
notice that the reader side have to be used in a transactional mode with
a repeat-until loop, like this:

repeat

t.rlock(id1);

until t.runlock(id1);

it is like a transactional mode of a Seqlock and id1 must be of type
"long" that must be imported from the SeqlockX unit. The writer side is
easy to program , please look at the "test.pas" example and you will
understand how to use my scalable distributed sequential lock.

My scalable SeqlockX can be used from accross processes and threads.

My new invention called scalable SeqlockX k beats the following
algorithm, it is more scalable and faster than the following algorithm
of Dmitry Vyukov called Distributed reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-wr...

And my new invention called scalable SeqlockX beats also the classical
Seqlock (i have just explained to you why ...), read here about it:

http://en.wikipedia.org/wi...

And this new invention called scalable SeqlockX can replace RCU cause it
is a powerfull synchronization mechanism.

I will explain my new algorithm of a scalable distributed sequential
lock that is very powerful...

I have arrived at my new algorithm by also reading many PhD papers and
by looking first at the weakness of the following algorithm of Dmitry
Vyukov's Reader-writer mutex:

http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-wr...

look at the writer side of the Dmitry Vykov's algorithm , it is doing
with every rwlock in each corresponding core:

for (i = 0; i != mtx->proc_count; i += 1)

pthread_rwlock_wrlock(&mtx-cell[i].mtx);

but this render this algorithm inefficient for the writer side , cause
every pthread_rwlock_wrlock() call transfer many cache-lines betwenn
cores, so this will generate too much cache line transfers between cores
when each writer executes distr_rw_mutex_wrlock() and this will become
worse and worse when you add more and more cores, i mean this will
transfer more and more cache-lines between cores with more and more
cores... so this will render the writer side throughput slow and this is
the weakness of this algorithm...

So this is why i have used a sequential lock mechanism in the write side
of my new algorithm so that to minimize efficiently the cache-lines
transfers, so if you look at my algorithm, it adds the follwing to the
classical Seqlock:

On the writer side if the result of TSeqlockX.RUnlock() method is
"false", that means the reader transaction has failed, so i am putting 1
into the FCount2^.FCount2 variable and if it's true, that means the
reader transaction has succeeded, i am putting 0 into the
FCount2^.FCount2 variable, and inside the writer side inside the
TSeqlockX.WLock() method, i am doing this:

while FCount2^.FCount2<>0 do sleep(0);

So if FCount2^.FCount2 is equal to its initial value it will exit the
while loop, and if FCount2^.FCount2 is equal to 1 it will wait for
TSeqlockX.RUnlock() to return true, that means it will wait for a reader
transaction to succeed, this part of the algorithm allows my algorithm
to avoids livelock when there is many writers, and the rest of the
algorithm look like a classical seqlock. That's all.

About the sequential consistency of my scalable distributed sequential
lock, my algorithm works on x86 architecture and it compiles on Delphi
and FreePascal compilers they both respect the strong memory model of
the x86 architecture. i think my algorithm is correct cause look at the
source code of the WLock() method, since i am using a Ticket spinlock
with a proportional backoff on the writer side, the Ticket spinlock is
using a "lock add" assembler instruction to increment a counter in the
Enter() method of the ticket spinlock , and this "lock add" assembler
instruction is a barrier for stores and loads on x86, and the loads of
Fcount2^.fcount2 is not reordered with the previous atomic lock of the
ticket spinlock, and the stores of FCount5^.fcount5 are not reordred
with olher stores and older loads , so the WLock() method is
sequentially consistent and correct, now look at the WUnlock() , we
don't need an "sfence" cause stores and loads on WUnlock() are not
reordered with older stores or older loads on x86 etc. so WUnlock()
method is sequentially consistent and correct, now look at the RLock()
method, it's also sequentially consistent., so all in all i think my
algorithm is sequentially consistent and correct on x86 and on the
Delphi and FreePascal compilers. So i think you can be confident cause i
have reasoned correctly and i think my scalable SeqlockX algorithm is
correct and it is a powerful synchronization mechanism that can replace
RCU and that can replace classical Seqlock cause it beats Selock.

You have to know also that the first parameter of the constructor is the
name that identifies the scalable SeqlockX object accross processes
bounderies.

You can download my scalable SeqlockX from:

https://sites.google.com/site/aminer68/scalabl...

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freep...

Operating Systems: Windows, Mac OSX , Linux on x86.

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems

Thank you,
Amine Moulay Ramdane.

comp.programming

Scalable SeqlockX is here...

Ramine

x Login to ForumsZone