Hi. I have a couple of remarks on your demo.
1) in threadentry(), the __atomic_store_n( join_futex, … )
is redundant for the goals of this demo.
this is because futex( join_futex, FUTEX_WAKE )
does not care about join_futex contents,
it wakes up waiters unconditionally
2) when you `ret' from `newthread' into `threadentry',
the `struct stack_head *stack'
starts to point to the free space ABOVE the stack
Probably, this does not harm, because
`entry' member of `struct stack_head'
is not needed anymore, but ...
this unpleasantly contrasts to common practice
> is redundant for the goals of this demo
That atomic store prevents a race condition and is essential. The futex
syscall atomically loads the futex and compares it to the expected value,
in this case zero. If the waiter arrives late, it misses the futex wakeup,
but it won't wait because the value has been changed away from zero. What
if it's non-zero in the first place? Then if it arrives early it won't
wait at all, which is bad for a different reason.
So it must be an atomic update followed by a wake-up in order to cover
both possibilities. If the futex was used frequently, perhaps an important
and mutex on a hot path, I'd want to avoid the syscall costs when there is
no contention. I would signal the presence of a waiter through the futex,
then conditionally wake based on the value. Similarly, on the other side
I'd check the futex in userspace before transitioning to the kernel. The
futex(7) man page describes (quite poorly) such an algorithm. Here's my
implementation of that concept:
https://github.com/skeeto/scratch/blob/master/misc/mutex.c#L77-L99
Spawning and joining a thread is rare (or should be), so I'm unconcerned
about the potentially unnecessary syscall on each side, opting instead for
simplicity.
> starts to point to the free space ABOVE the stack
Yup, true, though I'd describe it as the structure partially overlapping
the stack. The appropriate warning would be "do not touch the entry field
after newthread." In practice it contains the address of the last function
called directly by the entry function, but due to aliasing — the field is
modified indirectly in ways the compiler doesn't understand — it's
uncertain what might be read even by the thread running on that stack.