Hi Chris,
As mentioned in the Reddit thread I've recently started to use your
arena in some of my small, personal projects.
One thing I noticed right off the bat is the impedance mismatch
between your signed sizes and size_t. I use the standard library
often enough that it becomes annoying having to constantly cast sizes.
Eventually I changed the arena to accept unsigned types for size,
alignment, etc. Do I think sizes (and indices for that matter) should
be signed? Yes, for reasons stated in the various articles you've
linked to on your blog butu I'm not there yet in practice. What do
you do? Have you divorced yourself from the standard library such
that you use your signed sizes across additional functions beyond the
allocation?
Second, what do you do if you run out of memory? I was thinking of
modifying the arena code to gracefully handle this case but I haven't
sat down and worked out the details yet. Have you? In practice it's
not really a problem I'll encounter in my personal projects. If it
ever gets to the point where I'll consider putting it in production
quality code, it's something I might have to grapple with. That day
most likely will never come but I'm interested in what you think about
this.
Finally I started to dabble a bit in the Linux async I/O after looking
at a forum post project. I had not heard of it before and I plan to
code review the project when I get a chance. I've read a few blogs,
the man pages, and went through the code of a few simple examples -
cp, cat, bare bones web server. I found a bug or two in some of the
blog examples so I decided to write the code myself. The API is one of
the more low-level APIs I've seen. The kernel and user share two ring
buffers (submission and completion) and the programmer must be aware
of memory models when adding elements to the submission queue and
reaping results from the completion queue. The approaches I've seen
all have a lot of malloc/free churn. Malloc entries for the
submission queue, free after reaping, rinse and repeat. The total
amount of memory in flight can be capped by a code limit on the number
of elements on the submission queue at any one time, but the
allocation workflow doesn't seem to lend itself well to arenas. To
get around this I'm going to add a simple abstraction on top of the
reana - linked list - such that I can "free" reaped queue entries by
adding them back to the list so they can be reused for submissions.
I'd send you a link to the code but I haven't gotten around to
finishing it up yet. This is a long-winded way to ask you how, if at
all, you deal with freeing arena allocations.
All in all, I do really like arenas and I plan to continue to use them
as much as possible - you've convinced me. You can definitely do some
pretty cool stuff with them that you can't do with standard dynamic
memory such as take a snapshot of the state and immediately roll back
effectively deallocating all allocations in one lightning fast
operation.
Anyway, I'm interested in hearing your thoughts on the above.
Best,
Steadman
> impedance mismatch between your signed sizes and size_t
Keep in mind that:
1. A ptrdiff_t size is strictly non-negative. If in doubt, assert it. You
might temporarily compute a negative value, but that result goes straight
into a check (i.e. "does this fit?"), and a negative is dropped. (Having a
negative range available for such checks is, after all, a direct benefit
of signed arithmetic.) So it's always safe to cast a ptrdiff_t size as a
size_t argument. Suppose due to a bug the size was negative: Had you used
size_t all along you'd have already overflowed without noticing, which is
the same bug but harder to catch.
2. It's a common misconception that SIZE_MAX is the maximum supported
object size. It's the maximum value of a size_t, and size_t must have a
range supporting the maximum object size, but that doesn't mean that
SIZE_MAX is the maximum object size. In practice, PTRDIFF_MAX is the
maximum object size, and GCC in particular is explicit about this. That
means a result from, say, strlen() is always non-negative when cast to
ptrdiff_t. If that's not the case, then the program already entered an
undefined state when entering strlen().
My next post — still half-written from couple weeks ago — will cover (1)
in detail. Given both (1) and (2), sizes are always within the range of
overlap between ptrdiff_t and size_t (i.e. [0, PTRDIFF_MAX]), even across
most system interfaces, including libc. Therefore, with one exception, you
can freely cast sizes and subscripts between ptrdiff_t and size_t without
special concern. That's one reason I like "-Wno-sign-conversion".
The one exception is comparisons between signed and unsigned types. If you
compute a temporary negative for a check, it's critical the result is not
promoted to unsigned for the comparison. That case is thoroughly covered
by "-Wsign-compare" so such mistakes are caught at compile time. It would
have been better if C computed signed-unsigned comparisons in a sane way
(a la C3), not by promotion, but they weren't thinking about this stuff
back in the 1970s.
If you're paranoid, place assertions around all these casts, as I've shown
in some of my articles:
assert(size > 0);
assert(len >= 0);
That's essentially the rsize_t business that WG14 invented for C11, but
implementations chose not to implement. You could formalize by making a
function to do the cast after asserting:
size_t check_unsigned(ptrdiff_t);
ptrdiff_t check_signed(size_t);
Then:
ptrdiff_t size = ...;
p = malloc(check_unsigned(size));
ptrdiff_t len = check_signed(strlen(s));
I've used similar converting size types for interfaces accepting 32-bit
sizes (i.e. potentially truncating). Adding more checks probably worsen
impedance mismatch at system boundaries, but they should raise your
confidence about conversions at these points. Speaking of which…
> Have you divorced yourself from the standard library such that you use
> your signed sizes across additional functions beyond the allocation?
Per my "review of the C standard library" I see little value and hardly
interact with it in my own projects, so I have few unsigned interfaces to
worry about. However, system boundaries are in general messy affairs, and
that surface area must be carefully considered, probably minimized. That
includes libc. Win32 is loaded with inconsistent interfaces: some lengths
include the null terminator, others don't; some sizes are signed, some
unsigned; some sizes are 32-bit on 64-bit hosts, others are size-typed.
Many foreign calls have some degree of unavoidable friction.
> Second, what do you do if you run out of memory?
In u-config I have it print an OOM message then call os_fail() in the
platform layer, which in release builds exits the process. In tests this
longjmps back to the top-level, then keeps on testing. One of the tests
even deliberately triggers OOM: Try doing that without arenas! Failures
cannot possibly leak resources, either, of course, for the same reason.
In at least one article I've suggested storing a jmpbuf pointer in arenas,
giving the allocator a non-local "handler" goto on OOM. I'd probably use
this in a library (with __builtin_{setjmp,longmp}). The jmpbuf wouldn't be
part of the API (see libpng for proof of badness), but a tool to jump back
to the entry point and return an error, freeing intermediate layers of all
allocation failure checks. The last part makes programs so much cleaner
and simpler, so it's important.
Whatever you choose, keep in mind that the bar is low. The vast majority
of software using unbounded dynamic allocation does not gracefully handle
OOM. The usual plan is one of (1) abort without recovery, (2) allocate
until externally killed (e.g. Linux OOM killer), or (3) allocate until the
system becomes unstable (thrashing, etc.) and someone reboots it. Whatever
you choose will likely be better than average.
> the allocation workflow doesn't seem to lend itself well to arenas
When this happens, it's time to compose allocators. If objects fall into a
small number of size classes, it's sufficient to compose a freelist with
an arena. Before allocating from the arena, check the freelist. When done
with an object, push it onto the freelist. There's a great 2015 talk about
allocator composition here:
std::allocator Is to Allocation what std::vector Is to Vexation
https://www.youtube.com/watch?v=LIb3L4vKZ7U
The typical case for me is bookkeeping using a linked list, like a stack
or queue data structure which grows and shrinks throughout a depth- or
breath-first search. After popping a node and consuming its element, the
node goes onto a freelist for use in a later push/enqueue.
Though some problems really are unconstrained need fully general purpose
allocation, e.g. programming language implementation.