Re: Practical libc-free threading on Linux

Message ID
DKIM signature
Download raw message
Hey Chris,

Interesting article! And has me seriously considering using this over 
pthreads to gain more insight into what the kernel does.

I was compiling the examle listing with ASAN turned on and ran into this 

==22858==WARNING: ASan is ignoring requested __asan_handle_no_return: 
stack type: default top: 0x7ffcee0cb000; bottom 0x7f54eae46000; size: 
0x00a803285000 (721607479296)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189

I tried my limited knowledge of ASAN and GitHub issue threads but I 
haven't been able to make it go away. The warning originates from 
marking a function that the created thread is calling with
__attribute((noreturn)). For the original thread, there is no problem.

Calling __asan_handle_no_return(); manually also does no good as it runs 
into the same warning.

I was wondering if you have also run into this issue and whether you 
know how to solve it. Or is this just a limitation of ASAN that I am 
unaware of?

Kind regards,

Re: Practical libc-free threading on Linux

Message ID
<PRAP250MB061636FAC5890D665DF3880B89482@PRAP250MB0616.EURP250.PROD.OUTLOOK.COM> (view parent)
DKIM signature
Download raw message
ASan generally won't work with libc-free anything, so that error is not 
surprising. It expects specific, mostly undocumented behaviors from the 
runtime, with memory mapped a certain way for shadow bytes. It needs to be 
informed about stacks, and so needs to hook into thread creation. Getting 
ASan to work on top of a custom, low-level thread library is essentially 
the same as porting it to a new platform. It's a similar story for glibc 
alternatives like musl, which even still has limited ASan support.

Same goes for TSan, which not only needs to be informed about all threads, 
but also needs to understand all synchronization primitives. That happens 
automatically with atomics, but synchronization through the kernel (i.e. 
futexes) is invisible to TSan, so any locks you build would need to have 
TSan support. (External synchronization is the most common source of false 
positives even in normal circumstances.)

UBSan works great without libc, particularly in trap mode which requires 
no runtime support. That's is how I prefer to use it regardless.

With smart program organization you can have your cake and eat it, too: 
Write your application to run on top of a platform layer, then make the 
platform layer responsible for spawning all threads. It's a common mistake 
to create a thread abstraction that looks almost like standard threading 
APIs: spawn, join, etc. That's overkill and makes porting difficult, much 
like creating a generic malloc/realloc/free allocator when an arena is 
sufficient. No, instead spawn all threads (one per core?) at startup in 
the platform layer, and then either present them as a work queue to the 
application or give the application multiple entry points (e.g. so it can 
initially park all but the main thread in its own work queue). If you only 
create a fixed number of threads, you don't need to worry about cleaning 
them up, so no need for join, etc. Then you can have your libc-free layer 
for distribution or whatever, and a glibc layer using pthreads for running 
tests under ASan.

Even with glibc you'd use a custom allocator in your application. While 
ASan won't complain about it, it won't do much for you either unless you 
hook ASan into your allocator. That's not terribly difficult, and NRK 
seems to have had success with it. See here:


Re: Practical libc-free threading on Linux

Message ID
<PRAP250MB061636FAC5890D665DF3880B89482@PRAP250MB0616.EURP250.PROD.OUTLOOK.COM> (view parent)
DKIM signature
Download raw message
> [...] so then that would require me to implement a "good" SPMC-queue 
> with all the scheduling headaches [...]

With good I/O queues the kernel does most of that work for you: completion 
ports on Windows, io_uring on Linux, on kqueue on BSD. Block all threads 
on the queue and let the kernel schedule them. Linux epoll was infamously 
bad at exactly this, addressed with later hacks (i.e. EPOLLONESHOT), but 
you don't need it anymore. You'll need one platform layer per I/O queue 
API. Only on the legacy interfaces (i.e. select, poll), if you support 
them at all, do you need to implement your own SPMC.

The HTTP application is the rest, all platform-agnostic: parsing HTTP 
headers, managing session state, etc. The platform layer's job is just to 
read some bytes, pass them into the application on some thread, and write 
the returned bytes (i.e. queue a write), then repeat. Perhaps no more than 
a couple of entry points (using my style for succinctness):

    appctx *init(void *mem, size cap);
    typedef struct { i32 what; s8 output; } action;
    action update(appctx *, i32 who, i32 what, s8 input);

Where "update" could be called concurrently from multiple threads. It 
tells the application what (accept, close, read) happened to who (socket), 
and the application responds with an action (write, close, etc). Since the 
application cannot block (doesn't do I/O), atomics (incl. ticket locks) 
are sufficient for all internal synchronization. You could test it quite 
thoroughly without networking using a platform-agnostic test suite. (In 
u-config, for example, tests run on a virtual file system, and the test 
suite is plain old standard C that runs anywhere, even places without a 
file system.)

Also check out jart's readbean, a webserver written against a custom, 
amazing libc, i.e. it doesn't rely on the system libc.

> As in, I can just read memory from something that was allocated before 
> and is still in use and then ASAN woudldn't be able to tell?

Yup, because how would it know it's free? With a custom allocator, whether 
or not memory is free is a matter of your program's semantics, which ASan 
certainly cannot guess at. So you have to tell it about each allocate and 
free, just as malloc/free does otherwise.

With GCC's alloc_size attribute and optimizations enabled (important!), 
UBSan will insert some bounds checks for you, even in trap mode. It works 
on the same mechanism as _FORTIFY_SOURCE, and optimization is necessary to 
propagate bounds to loads. So you can get some ASan-like properties with a 
well-placed annotation.
Reply to thread Export thread (mbox)