~skeeto/public-inbox

1

Arenas

Steadman Dillroper <dillstead@gmail.com>
Details
Message ID
<CAOJH3v4L_L4=bjFUasTPEZptENQxZ8KRquff8J6qjqbdb0G-7w@mail.gmail.com>
DKIM signature
pass
Download raw message
Hi Chris,

As mentioned in the Reddit thread I've recently started to use your
arena in some of my small, personal projects.

One thing I noticed right off the bat is the impedance mismatch
between your signed sizes and size_t.  I use the standard library
often enough that it becomes annoying having to constantly cast sizes.
Eventually I changed the arena to accept unsigned types for size,
alignment, etc.  Do I think sizes (and indices for that matter) should
be signed?  Yes, for reasons stated in the various articles you've
linked to on your blog butu I'm not there yet in practice.  What do
you do?  Have you divorced yourself from the standard library such
that you use your signed sizes across additional functions beyond the
allocation?

Second, what do you do if you run out of memory?  I was thinking of
modifying the arena code to gracefully handle this case but I haven't
sat down and worked out the details yet.  Have you?  In practice it's
not really a problem I'll encounter in my personal projects.  If it
ever gets to the point where I'll consider putting it in production
quality code, it's something I might have to grapple with.  That day
most likely will never come but I'm interested in what you think about
this.

Finally I started to dabble a bit in the Linux async I/O after looking
at a forum post project.  I had not heard of it before and I plan to
code review the project when I get a chance. I've read a few blogs,
the man pages, and went through the code of a few simple examples -
cp, cat, bare bones web server.  I found a bug or two in some of the
blog examples so I decided to write the code myself. The API is one of
the more low-level APIs I've seen.  The kernel and user share two ring
buffers (submission and completion) and the programmer must be aware
of memory models when adding elements to the submission queue and
reaping results from the completion queue.  The approaches I've seen
all have a lot of malloc/free churn.  Malloc entries for the
submission queue, free after reaping, rinse and repeat.  The total
amount of memory in flight can be capped by a code limit on the number
of elements on the submission queue at any one time, but the
allocation workflow doesn't seem to lend itself well to arenas.  To
get around this I'm going to add a simple abstraction on top of the
reana - linked list - such that I can "free" reaped queue entries by
adding them back to the list so they can be reused for submissions.
I'd send you a link to the code but I haven't gotten around to
finishing it up yet.  This is a long-winded way to ask you how, if at
all, you deal with freeing arena allocations.

All in all, I do really like arenas and I plan to continue to use them
as much as possible - you've convinced me.  You can definitely do some
pretty cool stuff with them that you can't do with standard dynamic
memory such as take a snapshot of the state and immediately roll back
effectively deallocating all allocations in one lightning fast
operation.

Anyway, I'm interested in hearing your thoughts on the above.

Best,

Steadman
Details
Message ID
<20240426174059.of2r27o332m44sdk@nullprogram.com>
In-Reply-To
<CAOJH3v4L_L4=bjFUasTPEZptENQxZ8KRquff8J6qjqbdb0G-7w@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
> impedance mismatch between your signed sizes and size_t

Keep in mind that:

1. A ptrdiff_t size is strictly non-negative. If in doubt, assert it. You 
might temporarily compute a negative value, but that result goes straight 
into a check (i.e. "does this fit?"), and a negative is dropped. (Having a 
negative range available for such checks is, after all, a direct benefit 
of signed arithmetic.) So it's always safe to cast a ptrdiff_t size as a 
size_t argument. Suppose due to a bug the size was negative: Had you used 
size_t all along you'd have already overflowed without noticing, which is 
the same bug but harder to catch.

2. It's a common misconception that SIZE_MAX is the maximum supported 
object size. It's the maximum value of a size_t, and size_t must have a 
range supporting the maximum object size, but that doesn't mean that 
SIZE_MAX is the maximum object size. In practice, PTRDIFF_MAX is the 
maximum object size, and GCC in particular is explicit about this. That 
means a result from, say, strlen() is always non-negative when cast to 
ptrdiff_t. If that's not the case, then the program already entered an 
undefined state when entering strlen().

My next post — still half-written from couple weeks ago — will cover (1) 
in detail. Given both (1) and (2), sizes are always within the range of 
overlap between ptrdiff_t and size_t (i.e. [0, PTRDIFF_MAX]), even across 
most system interfaces, including libc. Therefore, with one exception, you 
can freely cast sizes and subscripts between ptrdiff_t and size_t without 
special concern. That's one reason I like "-Wno-sign-conversion".

The one exception is comparisons between signed and unsigned types. If you 
compute a temporary negative for a check, it's critical the result is not 
promoted to unsigned for the comparison. That case is thoroughly covered 
by "-Wsign-compare" so such mistakes are caught at compile time. It would 
have been better if C computed signed-unsigned comparisons in a sane way 
(a la C3), not by promotion, but they weren't thinking about this stuff 
back in the 1970s.

If you're paranoid, place assertions around all these casts, as I've shown 
in some of my articles:

assert(size >  0);
assert(len  >= 0);

That's essentially the rsize_t business that WG14 invented for C11, but 
implementations chose not to implement. You could formalize by making a 
function to do the cast after asserting:

size_t    check_unsigned(ptrdiff_t);
ptrdiff_t check_signed(size_t);

Then:

ptrdiff_t size = ...;
p = malloc(check_unsigned(size));
ptrdiff_t len = check_signed(strlen(s));

I've used similar converting size types for interfaces accepting 32-bit 
sizes (i.e. potentially truncating). Adding more checks probably worsen 
impedance mismatch at system boundaries, but they should raise your 
confidence about conversions at these points. Speaking of which…

> Have you divorced yourself from the standard library such that you use 
> your signed sizes across additional functions beyond the allocation?

Per my "review of the C standard library" I see little value and hardly 
interact with it in my own projects, so I have few unsigned interfaces to 
worry about. However, system boundaries are in general messy affairs, and 
that surface area must be carefully considered, probably minimized. That 
includes libc. Win32 is loaded with inconsistent interfaces: some lengths 
include the null terminator, others don't; some sizes are signed, some 
unsigned; some sizes are 32-bit on 64-bit hosts, others are size-typed. 
Many foreign calls have some degree of unavoidable friction.

> Second, what do you do if you run out of memory?

In u-config I have it print an OOM message then call os_fail() in the 
platform layer, which in release builds exits the process. In tests this 
longjmps back to the top-level, then keeps on testing. One of the tests 
even deliberately triggers OOM: Try doing that without arenas! Failures 
cannot possibly leak resources, either, of course, for the same reason.

In at least one article I've suggested storing a jmpbuf pointer in arenas, 
giving the allocator a non-local "handler" goto on OOM. I'd probably use 
this in a library (with __builtin_{setjmp,longmp}). The jmpbuf wouldn't be 
part of the API (see libpng for proof of badness), but a tool to jump back 
to the entry point and return an error, freeing intermediate layers of all 
allocation failure checks. The last part makes programs so much cleaner 
and simpler, so it's important.

Whatever you choose, keep in mind that the bar is low. The vast majority 
of software using unbounded dynamic allocation does not gracefully handle 
OOM. The usual plan is one of (1) abort without recovery, (2) allocate 
until externally killed (e.g. Linux OOM killer), or (3) allocate until the 
system becomes unstable (thrashing, etc.) and someone reboots it. Whatever 
you choose will likely be better than average.

> the allocation workflow doesn't seem to lend itself well to arenas

When this happens, it's time to compose allocators. If objects fall into a 
small number of size classes, it's sufficient to compose a freelist with 
an arena. Before allocating from the arena, check the freelist. When done 
with an object, push it onto the freelist. There's a great 2015 talk about 
allocator composition here:

std::allocator Is to Allocation what std::vector Is to Vexation
https://www.youtube.com/watch?v=LIb3L4vKZ7U

The typical case for me is bookkeeping using a linked list, like a stack 
or queue data structure which grows and shrinks throughout a depth- or 
breath-first search. After popping a node and consuming its element, the 
node goes onto a freelist for use in a later push/enqueue.

Though some problems really are unconstrained need fully general purpose 
allocation, e.g. programming language implementation.
Reply to thread Export thread (mbox)