Spin Lock

Steadman Dillroper <dillstead@gmail.com>
Message ID
DKIM signature
Download raw message
Hi Chris,

This is a really nice spin lock implementation.  Of course, as soon as
I saw it I immediately had to test it out myself.

I tested it with my Raspberry Pi 2 threading framework (3/4 of cores
are turned off so it's essentially a uniprocessor).  I turned off
preemption in order to have complete control over when the scheduler
is invoked.  I wanted to make sure the following case worked:


Indeed it does.

My concern was that the two threads could each get stuck in an
infinite loop depending on the particular interleaving of their
executions.  The fact that you test the two least significant bits in
your loop, one of the two threads is guaranteed to kick out because
the two bits will differ by one bit.  A subtle but important point.

Having not thought about it too deeply it seems like this approach can
be used for more than just two threads.  Is that something you have
thought about?

Nice work,

Message ID
<CAOJH3v7A6sQ0q5sbcTiJhuLC+u1DODyhmmkwgXRTGd0SntptNA@mail.gmail.com> (view parent)
DKIM signature
Download raw message
That's an interesting way to test/examine it. I would take this further, 
eschew threading entirely, and treat the whole problem as directed graph 
traversal. Each vertex represents system state (shared memory and every 
thread state), and each directed edge represents an operation on one 
thread. A traversal is just one particular ordering of events. A cycle 
that doesn't include the initial state is a deadlock, so to prove there 
are no deadlocks, exhaustively search the full graph looking for such 

My barrier is simple enough to work this all out with pencil and paper in 
a few minutes, where one can manually validate the graph. This is one 
reason I'm confident it's correct.

The other challenge: Does this graph capture all orderings? There may be 
re-orderings by the compiler or CPU which introduce additional vertices 
and edges. A few people got hung up on this with my barrier, but 
sequential consistency guarantees that nothing, including non-atomic 
accesses, is re-ordered around the atomic accesses, so it's easy to prove 
that there are no such hidden graph pieces.

Finally, I did my own thorough testing like the "coop.go" I shared in the 
article, where multiple threads wrestle over a shared non-atomic with the 
barrier as the referee. This runs for billions of iterations without 
tripping my assertions (segfault, panic), and data race detectors are 
happy with it. As a sanity check, I swapped in relaxed atomics, and it 
immediately fails my test as expected (read: sequential consistency has 
the effect I expect).

> I wanted to make sure the following case worked 

Since you're not *actually* testing with threads this doesn't really apply 
to your case, but I'll mention anyway it since I've seen this mistake a 
lot: With threading and benchmarking, do not test code using stdio. The 
underlying stdio buffer is synchronized, so it introduces additional 
synchronization and latency that hides races. For benchmarking it's likely 
slower than whatever it is you're actually testing, drowning out the 

In tests like "coop.go" I use that volatile null pointer dereference as a 
lightweight assertion specifically to avoid instrumentation interfering 
with the test. (In particular, glibc assert() is annoyingly bulky and 

> Is that something you have thought about?

I covered exactly this in my article. :-) It trivially generalizes to any 
power of two. With additional constraints/costs, it generalizes to any 
number of threads by putting it in terms of a different base.
Reply to thread Export thread (mbox)