From Christopher Wellons to ~skeeto/public-inbox
Thanks, Justine! I've already been an admirer of your work with APE,
Cosmopolitan, SectorLISP, and redbean. While I haven't built anything on
them — yet? they're a bit outside my usual target niche — I've learned
from your work and these projects, and I align with the philosophy.
> I added a bunch of your code to a project I'm working on
Great, and thanks for sharing this! That's exactly the sort of outcome I
like to see: literally copy-paste the useful stuff, adapt it to the local
context (style, interfaces), and take ownership. I'm also happy to hear
about Cosmopolitan getting threading.
From Christopher Wellons to ~skeeto/public-inbox
1. Great question, since I had to think through this more before I could put it into words. Acquire/release didn't click with me until I read Russ Cox's Memory Models series, specifically part 2: Memory Models https://research.swtch.com/mm Acquire/release establishes a happens-before edge, and the terminology evokes mutexes. Stores before a release are visible to loads after a matching acquire. The key insight for me: "These probably exist only because they are free on x86." In other words, these weak orderings probably exist to expose x86 memory ordering semantics to high level programs. If your program can safely rely on x86 memory ordering, then you really just need to ensure the compiler doesn't reorder your loads/stores.
From Christopher Wellons to ~skeeto/public-inbox
You're right, thanks! That's what I get for not actually compiling the examples. It's now fixed: https://github.com/skeeto/skeeto.github.com/commit/f405f48
From Christopher Wellons to ~skeeto/public-inbox
Thanks, Peter! It's now corrected (83cce70).
From Christopher Wellons to ~skeeto/public-inbox
Interesting approach. I used assembly in my version specifically to avoid the data race I mentioned. As undefined behavior, it's difficult to predict what a compiler might do. The memory re-ordering in the experiment is one of the many surprising outcomes of a data race, after all. The compiler could re-order loads and stores on its own because the statements are semantically independent (undermining the experiment). Since the variables aren't volatile, there may even be multiple loads and stores per assignment in the C source, giving an "impossible" result less/more often than it should. It may even elide the elide the "1" stores since those stores are never synchronized, and so never legitimately loaded. I inspected GCC's output on Linux — a little tricky with all the OpenMP control flow around it — and even without volatile the output roughly matches the desired assembly, with everything in the right order, so the
From Christopher Wellons to ~skeeto/public-inbox
That's an interesting way to test/examine it. I would take this further, eschew threading entirely, and treat the whole problem as directed graph traversal. Each vertex represents system state (shared memory and every thread state), and each directed edge represents an operation on one thread. A traversal is just one particular ordering of events. A cycle that doesn't include the initial state is a deadlock, so to prove there are no deadlocks, exhaustively search the full graph looking for such cycles. My barrier is simple enough to work this all out with pencil and paper in a few minutes, where one can manually validate the graph. This is one reason I'm confident it's correct. The other challenge: Does this graph capture all orderings? There may be
From Christopher Wellons to ~skeeto/public-inbox
> won't the mod still be needed inside the loop
Yes, but with a compile-time constant denominator that mod can be reduced
to multiplication with shift/sub/add, much like division. (On common
architectures, division and modulo are computed simultaneously from one
instruction.) It's not as lean as a bitwise mask, but should be better
than a division.
Though now that I think about it, even in the worst case you could compute
modulo ahead of the poll loop as a lower bound, then use a greater-than
comparison in the loop, exiting when the barrier exceeds the pre-computed
threshold. This avoids expensive (read: latency inducing) operations in
the poll loop.
From Christopher Wellons to ~skeeto/public-inbox
> LZ does not help enough on the expected short common substrings Indeed. It does worse than I had expected when I started digging into the problem, which is why I thought I could do better. > the mean difference between words (when represented as a base26 number) > is 26^5 / 12972 = 915 I looked at my email in the wrong order, because a previous message had me digging into this same idea, all the way through implementing it. Your estimate is right on: in the original Wordle list, the mean difference is 913! https://lists.sr.ht/~skeeto/public-inbox/%3CYiXIayjFm2O0um8e%40triptico.com%3E
From Christopher Wellons to ~skeeto/public-inbox
I don't understand the specifics of your scheme. It sounds like delta encoding, but I'm not sure. I also don't follow your math. I hadn't thought of delta encoding, though, so I just tried it out in a straightforward way. With each word as a base-26 number, I encoded the differences between adjacent words using Huffman coding, and I get 13,182 bytes of compressed data. Initially this seems much better, except the state table has 3,535 entries (from a Huffman tree of 1,768 leaves), many of which are large (up to 18 bits). In other words, it pushed all the storage into the state table, which isn't particularly efficient. The largest delta in the original Wordle list is 164,068, meaning every delta fits in 18 bits. Storing 12,972 of these makes for 29,187 bytes, which isn't quite as good as the Huffman coded version even counting the
From Christopher Wellons to ~skeeto/public-inbox
Thanks for the update, Ron! > both gratifying and unsettling. I know what you mean! > Your Game of Life makefile works In what sort of environment did you get it to work? After the first iteration I only get blank lines of output, and I'm unable to get it working anywhere. pdpmake has already been useful for double-checking some of my more complex Makefiles, and it's even caught mistakes. However, a couple of