Hi Chris,
I recently read your post on an arena-backed dynamic array implementation
in C¹. I thought it was pretty cool; I rarely see APIs where one
dereferences a function invocation.
For some context I am currently working on a C library full of routines I
commonly use in my various projects (such as UTF-8 string handling,
Unicode normalization, etc.), and I have added custom allocator support
for many functions which has proven extremely helpful. I do so in the
following manner:
/* An allocator function. It works in the same way as the Lua
allocator that you coincidentally also mentioned before² */
typedef void *(*alloc_fn)(void *ctx, void *ptr, size_t old,
size_t new);
/* Function to apply NFD string normalization */
char8_t *ucsnorm_nfd(size_t *bufsz, const char8_t *src, size_t srcn,
alloc_fn alloc, void *alloc_ctx);
One place where I’ve been having a lot of trouble though is, is
implementing a generic dynamic array that properly supports custom
allocators, and the main issue revolves around the fact that on
allocation error the given allocator will return NULL, so something like
what you did with an arena allocator would not work:
struct {
void *buf;
size_t len, cap;
alloc_fn alloc;
void *alloc_ctx;
} ints;
#define dapush(s) \
((s)->len >= (s)->cap \
? grow(s, sizeof(*(s)->data)), \
(s)->data + (s)->len++ \
: (s)->data + (s)->len++)
*dapush(ints) = 5; /* Allocation failed, so we segfault */
So that won’t work… so I tried a different approach. This time I tried
writing a ‘dapush()’ function where I can pass the item to push as an
argument, so that it can properly handle and report errors:
/* Push the item ‘x’ of size ‘sz’ bytes to the dynamic array ‘s’, and
return a pointer to the newly pushed item on success; or NULL on
error. */
void *dapush(void *s, void *x, size_t sz);
/* Example; dynarr() is a macro that examples to a dynamic array
struct with a properly typed ‘buf’ field. */
dynarr(int) ints = {.alloc = heapalloc};
int x = 5;
if (dapush(&ints, &x, sizeof(int)) == NULL)
err("dapush");
The above works… but it *really* sucks that I need to use a temporary
variable ‘x’ as I can’t just pass ‘&5’ in C. I could try just passing an
integer and not a pointer to an integer… but then I can’t store arrays of
structures greater than the size of a pointer.
I’ve been racking my brain for a while now and can’t think of a good
solution to this problem. Is this something you’ve tackled before? Do
you maybe have some ideas?
I actually even tried a cheeky use of variadic functions and
variably-modified types, but the following is illegal code as it turns
out:
void *
dapush(void *s, size_t sz, ...)
{
va_list ap;
va_start(ap, sz);
char[sz] x = va_arg(ap, char[sz]);
memcpy(/* ... */);
va_end(ap);
/* ... */
}
[1]: https://nullprogram.com/blog/2023/10/05/
[2]: https://nullprogram.com/blog/2023/12/17/
--
— Thomas
Thanks for writing, Thomas! Your library sounds interesting.
> the main issue revolves around the fact that on allocation error the > given allocator will return NULL
After adjusting my own habits, I found that eliminating allocation failure
checks at call sites, at least in the typical case, makes programs so much
clearer and simpler. Such checks are a significant burden when working in
conventional C, along with (mostly unnecessary) lifetime management. In
this very case it's complicating the "push" interface.
Worse, at least on hosted systems these checks are usually for naught. If
the allocator waits for the operating system to push back before failing
to allocate, then when pushback happens, if it happens at all (e.g. Linux
OOM killer), it's likely too late to recover. Even if recovers without
bumping into another allocation failure, the system is has probably been
unstable for some time: thrashing, or other processes are now experiencing
allocation failure (no commit charge available).
As for programs that genuinely believe they can recover: Is this actually
tested? At least with a custom allocator, like the one here, it's feasible
to test. Outside of C, recovery from OOM is typically not expected, even
if there's theoretically an exception to catch.
So that's why I talk about having an OOM policy handled in the allocator,
not at each allocation site. Exiting/aborting probably isn't appropriate
in a library. One idea is to require that the allocator never returns
null, which pushes OOM handling onto the allocator, and therefore it's up
to the application to apply its OOM policy from within the allocation
call. If you provide a default malloc-based allocator then exit/abort is
fine: In that case the caller isn't thinking about this stuff anyway.
My favorite option is setjmp/longjmp. Use setjmp just inside the library
entry point, and wrap the allocator in your own null check which longjmps
back to the entry point in order to return an error. GCC and Clang provide
them as lean built-ins, so this even works in freestanding environments.
typedef struct {
alloc_fn alloc;
alloc_ctx ctx;
jmp_buf oom;
} private_ctx;
char8_t *ucsnorm_nfd(...)
{
private_ctx ctx = {0};
ctx.alloc_fn = alloc_fn;
ctx.alloc_ctx = alloc_ctx;
if (setjmp(ctx.oom)) {
return 0; // OOM
}
// ...
}
This works great with arenas since there's no lifetime management, no
cleanup, but less so when you have to individually free the previous
successful allocations. You'd need this private context at every level
requiring cleanup, resembling exception unwinding. Probably not worth it.
If I was writing a library like this, if possible I'd reduce the custom
allocator to a block of memory, then arena-allocate internally. A function
like ucsnorm_nfd would allocate its result out of it, and use the rest as
temporary work space as needed. That block of memory might be expressed in
as arena, end/beg or mem/len struct, in the interface so that used space
is tracked, and the caller can keep using the arena across calls with the
usual paradigm. When I wrote about custom allocator interfaces, it wasn't
necessarily what I'd use if I were writing a library, but rather what is
nice to have as a consumer of a library that assumes a generic allocator.
> it *really* sucks that I need to use a temporary variable ‘x’ as I can’t > just pass ‘&5’ in C
Besides ergonomics, my macro evaluates to a pointer so that the assignment
is made outside the generic code. Then a compiler can produce a standard,
efficient store that assumes the usual aliasing and alignment constraints.
A store in the generic code with a memcpy() is less constrained. It can
alias with incompatible types. It can be unaligned. Unless the push call
is inlined, it's going to be a literal libc call every time. Awful.
Some better news: A macro can convert an unaddressable "rvalue" into an
addressable "lvalue" using a compound literal, at least sometimes. Caveat:
In the generic case this requires something along the lines of GCC typeof.
void push_(void *da, void *v, ptrdiff_t size);
#define push(da, v) push_(da, &(typeof(v)){v}, sizeof(v))
I'm just aware of this trick and haven't yet used it in practice.
On Thu May 9, 2024 at 5:56 AM CEST, Christopher Wellons wrote:
> Worse, at least on hosted systems these checks are usually for naught. If > the allocator waits for the operating system to push back before failing > to allocate, then when pushback happens, if it happens at all (e.g. Linux > OOM killer), it's likely too late to recover. Even if recovers without > bumping into another allocation failure, the system is has probably been > unstable for some time: thrashing, or other processes are now experiencing > allocation failure (no commit charge available).>> As for programs that genuinely believe they can recover: Is this actually > tested? At least with a custom allocator, like the one here, it's feasible > to test. Outside of C, recovery from OOM is typically not expected, even > if there's theoretically an exception to catch.
The thing with a custom allocator though is that failing to allocate
doesn’t have to mean OOM. I might be using a malloc-based allocator or
an growing arena allocator, but in the past I’ve also used allocators
that wrap a fixed-size static buffer in a daemon process that I’d prefer
didn’t just crash.
Now to be fair that was a one-off thing and I haven’t had to do such a
thing since… but it certainly was a thing I did.
> Besides ergonomics, my macro evaluates to a pointer so that the assignment > is made outside the generic code. Then a compiler can produce a standard, > efficient store that assumes the usual aliasing and alignment constraints. > A store in the generic code with a memcpy() is less constrained. It can > alias with incompatible types. It can be unaligned. Unless the push call > is inlined, it's going to be a literal libc call every time. Awful.
Indeed. I actually really liked how your solution worked with the whole
‘*push() = …’ paradigm, but as I’ve said if I have any form of error
checking then that kind of breaks because I might be assigning to null.
> Some better news: A macro can convert an unaddressable "rvalue" into an > addressable "lvalue" using a compound literal, at least sometimes. Caveat: > In the generic case this requires something along the lines of GCC typeof.>> void push_(void *da, void *v, ptrdiff_t size);> #define push(da, v) push_(da, &(typeof(v)){v}, sizeof(v))>> I'm just aware of this trick and haven't yet used it in practice.
I actually tried to use this trick as well, but it falls short the moment
you’re using a compound type. Imagine the following example:
#define _(x) &(typeof(x)){x}
struct foo {
int x, y;
};
struct foo f = {};
some_function(_(f)); /* error! */
Perhaps it would work to write some sort of _Generic macro that has cases
for all scalar values? I am not sure if such a thing is even possible
since the _Generic interface is absolute dogshit. The following naïve
example wouldn’t work for the same reason as above; since _Generic wants
all possible branch outcomes to be valid.
#define _(x) \
_Generic((x), \
int: &(typeof(x)){x}, \
default: &(x))
Thinking about it now, the best solution would probably be to just signal
something in the context provided to the allocator. In the general case
where I just want to abort on OOM, nothing changes too much:
/* Allocation failure will just abort */
dynarr(int) xs = {.alloc = alloc_heap};
*dapush(xs) = 1;
*dapush(xs) = 2;
*dapush(xs) = 3;
free(xs.buf);
However in the few- and far-between cases where I *do* want to handle
things properly, I can pass something signalling that in the allocation
context:
static char buf[sizeof(int) * 2];
jmp_buf env;
if (setjmp(env) != 0) {
handle_error();
return;
}
struct static_ctx ctx = {
.buf = buf,
.errjmp = env,
};
dynarr(int) xs = {
.alloc = alloc_static,
.alloc_ctx = &ctx,
};
*dapush(xs) = 1;
*dapush(xs) = 2;
*dapush(xs) = 3; /* Will trigger an OOM */
This also avoids me needing to check for null on every push, I just do my
handling in one place and on the rare case of error we jump there.
--
— Thomas
On Thu, May 09, 2024 at 02:15:09PM +0200, Thomas Voss wrote:
> However in the few- and far-between cases where I *do* want to handle> things properly, I can pass something signalling that in the allocation> context:> [...]> jmp_buf env;> if (setjmp(env) != 0) {> handle_error();> return;> }> [...]> This also avoids me needing to check for null on every push, I just do my> handling in one place and on the rare case of error we jump there.
Here's an idea, use a dummy entry and a oom flag. Pseudo-code:
struct {
T *data;
Size cap;
Size len;
int oom;
T dummy;
};
#define push(...) (
v->len >= v->cap ? grow(...) : (void)0,
v->oom ? &v->dummy : v->data + v->len++
)
Now you just keep pushing without doing any checks and at the very do
the error check once.
for (...) {
*push(&v) = ...;
}
if (v.oom) {
// handle errror
}
Although quite frankly, I'm skeptical that a dynmic array is even
needed. I'm not familiar with unicode normalization, but can you not ask
the caller to simply provide a buffer large enough for the worst case
input?
If you're worried about wastage, the caller can "shrink" the buffer
later on. Example using standard realloc()
size_t worst = lib_size_needed(input_len);
T *buf = checked_malloc(worst); // malloc wrapper that aborts on failure
len = lib_do_thing(input, input_len, buf); // no bounds checks needed for output buffer
buf = checked_realloc(buf, len); // shrink if needed
The conservative "allocate small and grow if needed" mindset complicates
code (by spreading OOM concerns everywhere compared to doing it once
upfront) and IMO is not worthwhile unless you're dealing with "big data"
where the difference between average case and worst case is going to be
in gigabytes or such.
Another option is to return an error code telling that the buffer wasn't
large enough. Optionally you can also return the size/len that would've
been needed (this is what libgrapheme does:
https://libs.suckless.org/libgrapheme/man/grapheme_to_titlecase_utf8.3).
The caller can then decide whether they want to grow and retry or to
abort/give up.
Yet another option - and I don't recommend it unless you're doing some
sort of streaming interface where the data is too big to be held in
memory - is to have a ZSTD style interface where it returns back when
the output buffer is full.
https://github.com/facebook/zstd/blob/97291fc5020a8994019ab76cf0cda83a9824374c/examples/streaming_compression.c#L71-L86
The reason why I don't recommend this style of API is because not only
does it make the implementation difficult as you need to save and
restore all intermediate state between calls. But more importantly it
makes the calling code difficult as well due to having to wrap stuff in
a `do { } while` loop.
- NRK
> The conservative "allocate small and grow if needed" mindset complicates > code … and IMO is not worthwhile unless you're dealing with "big data"
This mindset is common in conventional C and needs a name. Maybe something
like "16-bit thinking"? The stdio input interfaces implicitly assume it,
designed to handle little chunks of input at a time, and those assumptions
propagate into programs using it. Accepting complexity and performance
costs to absolutely minimize memory use makes sense for the constrained
resources of a 16-bit machine, where C was incubated, but it's a poor
trade-off on 32-bit and especially 64-bit machines.
In general, a configuration file doesn't need to be read(2) a few lines at
time. Memory doesn't need to be freed the instant it's no longer in use
(formalized by smart pointers and reference counting). I hadn't thought of
"allocate small and grow if needed" as another case until now.