Recent activity

Re: Nice concurrency article 12 days ago

From Christopher Wellons to ~skeeto/public-inbox

Thanks, Justine! I've already been an admirer of your work with APE, 
Cosmopolitan, SectorLISP, and redbean. While I haven't built anything on 
them — yet? they're a bit outside my usual target niche — I've learned 
from your work and these projects, and I align with the philosophy.

> I added a bunch of your code to a project I'm working on

Great, and thanks for sharing this! That's exactly the sort of outcome I 
like to see: literally copy-paste the useful stuff, adapt it to the local 
context (style, interfaces), and take ownership. I'm also happy to hear 
about Cosmopolitan getting threading.

Re: A lock-free, concurrent, generic queue in 32 bits 13 days ago

From Christopher Wellons to ~skeeto/public-inbox

1. Great question, since I had to think through this more before I could 
put it into words. Acquire/release didn't click with me until I read Russ 
Cox's Memory Models series, specifically part 2:

Memory Models
https://research.swtch.com/mm

Acquire/release establishes a happens-before edge, and the terminology 
evokes mutexes. Stores before a release are visible to loads after a 
matching acquire. The key insight for me: "These probably exist only 
because they are free on x86." In other words, these weak orderings 
probably exist to expose x86 memory ordering semantics to high level 
programs. If your program can safely rely on x86 memory ordering, then you 
really just need to ensure the compiler doesn't reorder your loads/stores.

Re: A lock-free, concurrent, generic queue in 32 bits 14 days ago

From Christopher Wellons to ~skeeto/public-inbox

You're right, thanks! That's what I get for not actually compiling the 
examples. It's now fixed:

https://github.com/skeeto/skeeto.github.com/commit/f405f48

Re: Typo on article Chunking Optimizations: Let the Knife Do the Work a month ago

From Christopher Wellons to ~skeeto/public-inbox

Thanks, Peter! It's now corrected (83cce70).

Re: Impossible memory state with OpenMP 2 months ago

From Christopher Wellons to ~skeeto/public-inbox

Interesting approach. I used assembly in my version specifically to avoid 
the data race I mentioned. As undefined behavior, it's difficult to 
predict what a compiler might do. The memory re-ordering in the experiment 
is one of the many surprising outcomes of a data race, after all. The 
compiler could re-order loads and stores on its own because the statements 
are semantically independent (undermining the experiment). Since the 
variables aren't volatile, there may even be multiple loads and stores per 
assignment in the C source, giving an "impossible" result less/more often 
than it should. It may even elide the elide the "1" stores since those 
stores are never synchronized, and so never legitimately loaded.

I inspected GCC's output on Linux — a little tricky with all the OpenMP 
control flow around it — and even without volatile the output roughly 
matches the desired assembly, with everything in the right order, so the

Re: Spin Lock 2 months ago

From Christopher Wellons to ~skeeto/public-inbox

That's an interesting way to test/examine it. I would take this further, 
eschew threading entirely, and treat the whole problem as directed graph 
traversal. Each vertex represents system state (shared memory and every 
thread state), and each directed edge represents an operation on one 
thread. A traversal is just one particular ordering of events. A cycle 
that doesn't include the initial state is a deadlock, so to prove there 
are no deadlocks, exhaustively search the full graph looking for such 
cycles.

My barrier is simple enough to work this all out with pencil and paper in 
a few minutes, where one can manually validate the graph. This is one 
reason I'm confident it's correct.

The other challenge: Does this graph capture all orderings? There may be

Re: A flexible, lightweight, spin-lock barrier 2 months ago

From Christopher Wellons to ~skeeto/public-inbox

> won't the mod still be needed inside the loop

Yes, but with a compile-time constant denominator that mod can be reduced 
to multiplication with shift/sub/add, much like division. (On common 
architectures, division and modulo are computed simultaneously from one 
instruction.) It's not as lean as a bitwise mask, but should be better 
than a division.

Though now that I think about it, even in the worst case you could compute 
modulo ahead of the poll loop as a lower bound, then use a greater-than 
comparison in the loop, exiting when the barrier exceeds the pre-computed 
threshold. This avoids expensive (read: latency inducing) operations in 
the poll loop.

Re: Fwd: Compressing and embedding a Wordle word list 2 months ago

From Christopher Wellons to ~skeeto/public-inbox

> LZ does not help enough on the expected short common substrings

Indeed. It does worse than I had expected when I started digging into the 
problem, which is why I thought I could do better.

> the mean difference between words (when represented as a base26 number) 
> is 26^5 / 12972 = 915

I looked at my email in the wrong order, because a previous message had me 
digging into this same idea, all the way through implementing it. Your 
estimate is right on: in the original Wordle list, the mean difference is 
913!

https://lists.sr.ht/~skeeto/public-inbox/%3CYiXIayjFm2O0um8e%40triptico.com%3E

Re: Compressing and embedding a Wordle word list 2 months ago

From Christopher Wellons to ~skeeto/public-inbox

I don't understand the specifics of your scheme. It sounds like delta 
encoding, but I'm not sure. I also don't follow your math.

I hadn't thought of delta encoding, though, so I just tried it out in a 
straightforward way. With each word as a base-26 number, I encoded the 
differences between adjacent words using Huffman coding, and I get 13,182 
bytes of compressed data. Initially this seems much better, except the 
state table has 3,535 entries (from a Huffman tree of 1,768 leaves), many 
of which are large (up to 18 bits). In other words, it pushed all the 
storage into the state table, which isn't particularly efficient.

The largest delta in the original Wordle list is 164,068, meaning every 
delta fits in 18 bits. Storing 12,972 of these makes for 29,187 bytes, 
which isn't quite as good as the Huffman coded version even counting the

Re: A Tutorial on Portable Makefiles 3 months ago

From Christopher Wellons to ~skeeto/public-inbox

Thanks for the update, Ron!

> both gratifying and unsettling.

I know what you mean!

> Your Game of Life makefile works

In what sort of environment did you get it to work? After the first 
iteration I only get blank lines of output, and I'm unable to get it 
working anywhere.

pdpmake has already been useful for double-checking some of my more 
complex Makefiles, and it's even caught mistakes. However, a couple of