Hi Chris, This is a really nice spin lock implementation. Of course, as soon as I saw it I immediately had to test it out myself. I tested it with my Raspberry Pi 2 threading framework (3/4 of cores are turned off so it's essentially a uniprocessor). I turned off preemption in order to have complete control over when the scheduler is invoked. I wanted to make sure the following case worked: https://gist.github.com/dillstead/e0bb15cd9b0ba3393cf1ca02f4fdfab7 Indeed it does. My concern was that the two threads could each get stuck in an infinite loop depending on the particular interleaving of their executions. The fact that you test the two least significant bits in your loop, one of the two threads is guaranteed to kick out because the two bits will differ by one bit. A subtle but important point. Having not thought about it too deeply it seems like this approach can be used for more than just two threads. Is that something you have thought about? Nice work, Stedman
That's an interesting way to test/examine it. I would take this further, eschew threading entirely, and treat the whole problem as directed graph traversal. Each vertex represents system state (shared memory and every thread state), and each directed edge represents an operation on one thread. A traversal is just one particular ordering of events. A cycle that doesn't include the initial state is a deadlock, so to prove there are no deadlocks, exhaustively search the full graph looking for such cycles. My barrier is simple enough to work this all out with pencil and paper in a few minutes, where one can manually validate the graph. This is one reason I'm confident it's correct. The other challenge: Does this graph capture all orderings? There may be re-orderings by the compiler or CPU which introduce additional vertices and edges. A few people got hung up on this with my barrier, but sequential consistency guarantees that nothing, including non-atomic accesses, is re-ordered around the atomic accesses, so it's easy to prove that there are no such hidden graph pieces. Finally, I did my own thorough testing like the "coop.go" I shared in the article, where multiple threads wrestle over a shared non-atomic with the barrier as the referee. This runs for billions of iterations without tripping my assertions (segfault, panic), and data race detectors are happy with it. As a sanity check, I swapped in relaxed atomics, and it immediately fails my test as expected (read: sequential consistency has the effect I expect). > I wanted to make sure the following case worked Since you're not *actually* testing with threads this doesn't really apply to your case, but I'll mention anyway it since I've seen this mistake a lot: With threading and benchmarking, do not test code using stdio. The underlying stdio buffer is synchronized, so it introduces additional synchronization and latency that hides races. For benchmarking it's likely slower than whatever it is you're actually testing, drowning out the results. In tests like "coop.go" I use that volatile null pointer dereference as a lightweight assertion specifically to avoid instrumentation interfering with the test. (In particular, glibc assert() is annoyingly bulky and indirect.) > Is that something you have thought about? I covered exactly this in my article. :-) It trivially generalizes to any power of two. With additional constraints/costs, it generalizes to any number of threads by putting it in terms of a different base.