Hey Chris,
Interesting article! And has me seriously considering using this over
pthreads to gain more insight into what the kernel does.
I was compiling the examle listing with ASAN turned on and ran into this
warning:
==22858==WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: 0x7ffcee0cb000; bottom 0x7f54eae46000; size:
0x00a803285000 (721607479296)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
I tried my limited knowledge of ASAN and GitHub issue threads but I
haven't been able to make it go away. The warning originates from
marking a function that the created thread is calling with
__attribute((noreturn)). For the original thread, there is no problem.
Calling __asan_handle_no_return(); manually also does no good as it runs
into the same warning.
I was wondering if you have also run into this issue and whether you
know how to solve it. Or is this just a limitation of ASAN that I am
unaware of?
Kind regards,
Flo
ASan generally won't work with libc-free anything, so that error is not
surprising. It expects specific, mostly undocumented behaviors from the
runtime, with memory mapped a certain way for shadow bytes. It needs to be
informed about stacks, and so needs to hook into thread creation. Getting
ASan to work on top of a custom, low-level thread library is essentially
the same as porting it to a new platform. It's a similar story for glibc
alternatives like musl, which even still has limited ASan support.
Same goes for TSan, which not only needs to be informed about all threads,
but also needs to understand all synchronization primitives. That happens
automatically with atomics, but synchronization through the kernel (i.e.
futexes) is invisible to TSan, so any locks you build would need to have
TSan support. (External synchronization is the most common source of false
positives even in normal circumstances.)
UBSan works great without libc, particularly in trap mode which requires
no runtime support. That's is how I prefer to use it regardless.
With smart program organization you can have your cake and eat it, too:
Write your application to run on top of a platform layer, then make the
platform layer responsible for spawning all threads. It's a common mistake
to create a thread abstraction that looks almost like standard threading
APIs: spawn, join, etc. That's overkill and makes porting difficult, much
like creating a generic malloc/realloc/free allocator when an arena is
sufficient. No, instead spawn all threads (one per core?) at startup in
the platform layer, and then either present them as a work queue to the
application or give the application multiple entry points (e.g. so it can
initially park all but the main thread in its own work queue). If you only
create a fixed number of threads, you don't need to worry about cleaning
them up, so no need for join, etc. Then you can have your libc-free layer
for distribution or whatever, and a glibc layer using pthreads for running
tests under ASan.
Even with glibc you'd use a custom allocator in your application. While
ASan won't complain about it, it won't do much for you either unless you
hook ASan into your allocator. That's not terribly difficult, and NRK
seems to have had success with it. See here:
https://codeberg.org/NRK/slashtmp/src/branch/master/data-structures/u-list.c#L44-L55
> [...] so then that would require me to implement a "good" SPMC-queue
> with all the scheduling headaches [...]
With good I/O queues the kernel does most of that work for you: completion
ports on Windows, io_uring on Linux, on kqueue on BSD. Block all threads
on the queue and let the kernel schedule them. Linux epoll was infamously
bad at exactly this, addressed with later hacks (i.e. EPOLLONESHOT), but
you don't need it anymore. You'll need one platform layer per I/O queue
API. Only on the legacy interfaces (i.e. select, poll), if you support
them at all, do you need to implement your own SPMC.
The HTTP application is the rest, all platform-agnostic: parsing HTTP
headers, managing session state, etc. The platform layer's job is just to
read some bytes, pass them into the application on some thread, and write
the returned bytes (i.e. queue a write), then repeat. Perhaps no more than
a couple of entry points (using my style for succinctness):
appctx *init(void *mem, size cap);
typedef struct { i32 what; s8 output; } action;
action update(appctx *, i32 who, i32 what, s8 input);
Where "update" could be called concurrently from multiple threads. It
tells the application what (accept, close, read) happened to who (socket),
and the application responds with an action (write, close, etc). Since the
application cannot block (doesn't do I/O), atomics (incl. ticket locks)
are sufficient for all internal synchronization. You could test it quite
thoroughly without networking using a platform-agnostic test suite. (In
u-config, for example, tests run on a virtual file system, and the test
suite is plain old standard C that runs anywhere, even places without a
file system.)
Also check out jart's readbean, a webserver written against a custom,
amazing libc, i.e. it doesn't rely on the system libc.
> As in, I can just read memory from something that was allocated before
> and is still in use and then ASAN woudldn't be able to tell?
Yup, because how would it know it's free? With a custom allocator, whether
or not memory is free is a matter of your program's semantics, which ASan
certainly cannot guess at. So you have to tell it about each allocate and
free, just as malloc/free does otherwise.
With GCC's alloc_size attribute and optimizations enabled (important!),
UBSan will insert some bounds checks for you, even in trap mode. It works
on the same mechanism as _FORTIFY_SOURCE, and optimization is necessary to
propagate bounds to loads. So you can get some ASan-like properties with a
well-placed annotation.