~skeeto/public-inbox

4 2

Two-in-one arenas

Details
Message ID
<20231015233305.sssrgorhqu2qo5jr@nullprogram.com>
DKIM signature
missing
Download raw message
The other day I had a brief discussion with u/treefidgety, who did not 
like the idea of passing two arenas down the call stack any time there was 
a function that would need both. It got me thinking about how this might 
be avoided, and I came up with an idea of using one arena for both roles.

The core concept: An arena has two "ends" and we could use the other end 
for scratch allocations. Since callees/callers alternate arenas, which end 
is which depends on the position in the call stack. When passing an arena, 
the ends usually swap, and we can do this while keeping the two-pointer 
arena format. I call the ends "persist" and "scratch". Allocations that 
survive local scope are made from the former, and scoped allocations from 
the latter.

    typedef struct {
        char **persist;
        char  *scratch;
    } arena;

The persist end is a double pointer pointing at the parent arena's scratch 
pointer. The "root" arena points at some lone "char *". It's a double 
pointer so that such allocations persist. The trick is that "persist" may 
point at either the high end or the low end of the free region. The 
allocator compares the pointers to determine which way allocations "grow".

To make scratch allocations, you extract a child arena by swapping the 
ends. Its persist pointer will be attached to the scratch end of the 
parent arena:

    arena getscratch(arena *a)
    {
        arena r = {0};
        r.persist = &a->scratch;
        r.scratch = *a->persist;
        return r;
    }

This is the only case where arenas are passed by their address. In all 
other cases it's passed by copy. They already contain a double pointer 
after all. Allocating from the parent arena invalidates the scratch arena, 
which must be recreated if needed again. Fortunately that's cheap!

As mentioned, the allocator must be able to allocate from either end, 
which works a little differently. Note how the arena pass-by-copy.

    void *alloc(arena a, ptrdiff_t size, ptrdiff_t align, ptrdiff_t count)
    {
        char *r;
        if (*a.persist < a.scratch) {
            ptrdiff_t avail   = a.scratch - *a.persist;
            ptrdiff_t padding = -(uintptr_t)*a.persist & (align - 1);
            if (count > (avail - padding)/size) {
                abort();
            }
            r = *a.persist + padding;
            *a.persist += padding + size*count;
        } else {
            ptrdiff_t avail   = *a.persist - a.scratch;
            ptrdiff_t padding = +(uintptr_t)*a.persist & (align - 1);
            if (count > (avail - padding)/size) {
                abort();
            }
            *a.persist -= padding + size*count;
            r = *a.persist;
        }
        return memset(r, 0, size*count);
    }

Losing the straightforward simplicity of the original is a shame, but 
that's the cost. Creating an arena looks something like this (note: the 
top-level persist will be modified, so we could not free() it without 
saving a copy of the pointer):

    ptrdiff_t cap = 1<<20;
    char *persist = malloc(cap);
    arena a = {0};
    a.persist = &persist;
    a.scratch = persist + cap;

A simple case when passing the arena to a helper function:

    thing *compute(..., arena a)
    {
        thing *t = new(a, thing);
        result *r = helper(getscratch(&a), ...);
        t->value = r->a * r->b;
        return t;
    }

The helper allocated its return value from compute's scratch side because 
we passed it a child arena. This updated the parent's scratch pointer, so 
these allocations won't be clobbered. It made its own scratch allocations 
from the parent's persist side, but because that was on a local copy of 
that pointer, these allocations were automatically freed on return.

If we were passing results straight out, we don't swap the ends, just as 
before we don't swap the arenas. (This is why the caller decides whether 
to extract a child arena, not the callee.)

    thing *compute(..., arena a)
    {
        thing *r = new(a, thing);
        r->next = helper(a, ...);
        return r;
    }

If we wanted to scope some scratch allocations, make a copy of the parent 
arena then create a child arena from it:

    arena a = ...;
    {
        arena tmp = a;
        arena scratch = getscratch(&tmp);
        for (...) {
            *upsert(..., scratch) = ...;
        }
    }

As mentioned, parent allocation invalidates the child arena — but not its 
existing scratch allocations! — so it must be recreated in order to make 
further scratch allocations:

    thing *a = new(parent, thing);
    arena scratch = getscratch(&parent);
    thing *x = new(scratch, thing);
    thing *b = new(parent, thing);  // invalidates scratch!
    scratch = getscratch(&parent);  // recreate the child
    thing *y = new(scratch, thing);

These last two cases are probably the trickiest part when using this 
two-in-one arena.

I'm not convinced these are all the right terms, and perhaps "getscratch" 
should use the name "child" in some way. I also still need to test it out 
in some practical projects to see how it goes.
Details
Message ID
<ijfpvjwyv6et5r4z24cxfpc3akxi6etfi5qdbj4uajbm2o7dsc@ufjjmugl7txg>
In-Reply-To
<20231015233305.sssrgorhqu2qo5jr@nullprogram.com> (view parent)
DKIM signature
missing
Download raw message
>    typedef struct {
>        char **persist;
>        char  *scratch;
>    } arena;

Interesting. This way of representing an arena is completely new to me.
I'm also not entirely clear on the entire "alternating" business, I'll
probably need to try it out in practice in order to give any meaningful
feedback.

OTOH, having a "double ended" arena *is* something that I have some
previous experience with (although I represented it quite differently).

For some context, this was a couple months ago on a toy project where I
needed to "interleave" permanent and scratch allocations. Initially I
had an `arena_split()` function - much like your `newscratch()` function
in scratch::misc/treap.c. Some issues I had with it were:

a) It cannot make use of all the available arena space. If a sub-arena
runs out, then it's considered OOM.
b) Splitting based on the arena capacity (without taking remaining space
into account) seemed not too robust.
c) If you're frequently splitting the arena throughout the program's
lifetime, then splitting based on remaining space means that as the
remaining space changes, the splits are going to end up in different
places. In other words, it will end up touching (i.e physically mapping)
more pages than it otherwise would actually have needed.

A double ended arena sort of solves the above issues, which is why I was
investigating it. The way I had represented it was via explicit
`{front,back}_offset` and the caller had to indicate where he wanted to
allocate from. Already, this was a bad trade ergonomics/composability
wise.

Your representation seems to solve this problem, but there's also a
couple other issues that are inherent when allocating down. We usually
deal with memory "upwards" not "downwards". And so techniques like
"allocate max -> use -> shrink" wouldn't work anymore when the arena
works downwards. The "fast realloc" trick Dennis shared on the list also
wouldn't work for the same reason.

Though perhaps these are not that big of a problem in practice. Again,
I'll have to try it out in order to get some concrete feel for it.

P.S: I was also recently playing around with a "manually committed"
arena [0], which in theory is more portable in case the system has
overcommit turned off or whatever. In practice, it imposes some ugly
constrains by turning the arena into a more "stateful object" and also
doesn't play nice with splitting. But perhaps this would work better
with a double-ended arena.

[0]: https://codeberg.org/NRK/slashtmp/src/branch/master/puzzles/two-sum.c

- NRK
Details
Message ID
<20231017183445.lfv33l5v52c5i3yi@nullprogram.com>
In-Reply-To
<ijfpvjwyv6et5r4z24cxfpc3akxi6etfi5qdbj4uajbm2o7dsc@ufjjmugl7txg> (view parent)
DKIM signature
missing
Download raw message
> not entirely clear on the entire "alternating" business

Yeah, that's one of the major downsides. If, of all people, _you're_ not 
clear on it, then the typical person using it, or even just reading the 
code, will probably be unable to reason whether or not its use is correct. 
(Difficulty reasoning about code that should be straightforward and simple 
is one of my chief complaints about modern C++ after all.)

> The "fast realloc" trick Dennis shared on the list also wouldn't work 
> for the same reason.

Yup, definitely another cost.

Tangent: In preparation for a future article, I've revisited an old Go 
prototype toying with linear allocation in Go. This isn't the official 
arena experiment, but just some unsafe plus generics. Generics unlocked 
linear allocation as a viable, practical option in Go. Due to garbage 
collection, Go's unsafe pointer semantics are stricter. In particular, 
one-past-the-end is forbidden. Not until it's gone that one appreciates 
just how useful it is.

This makes the two-pointer beg/end form unviable: How do you place the 
"end" pointer? Even with a uintptr end, the "beg" pointer may eventually 
match it. (There must always be a real pointer pointing at the arena to 
keep it alive.) An easy fix is to allocate top-down, leaving "beg" in 
place at the beginning. But top-down loses the advantages you mentioned, 
especially for arena-backed slices, which are even more natural for Go.

> it imposes some ugly constrains by turning the arena into a more 
> "stateful object"

Well-stated. I've tried out gradual-commit arenas, too, and was never 
satisfied with the trade-offs, at least for my needs. That's why I never 
committed an implementation as part of a project.

(Along with your recent w64devkit issue comment, that "learning native 
windows interfaces" comment in two-sum.c is interesting to see!)
Details
Message ID
<bn4pd5cpb2svdx2j2hzfkdf5xcnkrq3xpvllqfn5h4nlld5ity@c3yuntxgxljm>
In-Reply-To
<20231017183445.lfv33l5v52c5i3yi@nullprogram.com> (view parent)
DKIM signature
missing
Download raw message
> > not entirely clear on the entire "alternating" business

After thinking about it a bit more, I'm now cleared up on this part. It
also makes sense why a "perm" allocation would invalidate a "scratch"
arena, since the bounds of the scratch didn't update.

Although I haven't tried it out, this seems like a cognitive overhead
that I'd want to avoid. The ease of memory management is one of the key
reason to use an arena after all.

But giving up on double-ended arena might be a bit premature. So here's
my 2nd attempt at it:

	typedef struct {
		u8 *beg, *end;
		enum { FORWARDS, BACKWARDS } perm_direction;
	} Arena;
	enum { NO_ZERO  =  1 << 0, SCRATCH  =  1 << 1, };

Inspired by your double pointer, the arena now keeps track of which way
the "permanent" end is via `perm_direction`. The caller simply specifies
`SCRATCH` in `alloc_flags` if he wants allocation from the "scratch"
end. This solves the composeability issue I had, since callers no longer
hard-code forwards/backwards direction.

In 90% of cases, this arena should behave the *exact* same way as a
regular arena. This includes the "scoped arena" trick.

But in cases where you need to interleave scratch and permanent
allocations, the "scoped arena" trick has to be given up. Instead you'd
revert back to the "checkpoint => restore" technique.

The gist of it:

	enum { KEEP_PERM  = 1 << 0, KEEP_SCRATCH = 1 << 1 };
	static void arena_restore(Arena *a, Arena restore, int flags)
	{
		ASSERT(a->perm_direction == restore.perm_direction);
		ASSERT((flags & ~(KEEP_PERM|KEEP_SCRATCH)) == 0);
		ASSERT(flags != (KEEP_PERM|KEEP_SCRATCH)); // doesn't make sense
		// ...
	}
	
	// usage
	Arena checkpoint = *a;
	... = arena_alloc(a, Type, cnt, 0x0);
	... = arena_alloc(a, Type, cnt, SCRATCH);
	arena_restore(a, checkpoint, KEEP_PERM); // keeps "perm" allocations

I'm not yet sure if KEEP_SCRATCH makes any sense or not. But I've
included it for completeness as of now.

Losing the implicit restoration of "scoped arena" is a bit of a shame, I
really liked that trick. But this seems much less of a footgun compared
to perm allocation invalidating the scratch arena - using both
simultaneously is the entire point of a double-ended arena after all.

I have yet to try this out in a real/semi-real project. But another
benefit of this approach might be that existing projects can trivially
convert from regular arenas to double-ended ones since the usage remains
the exact same in most situation.

> In particular, one-past-the-end is forbidden.

Perhaps going with the good old { u8 *buf; Size off, cap; } layout might
be more feasible in Go?

> (Along with your recent w64devkit issue comment, that "learning native
> windows interfaces" comment in two-sum.c is interesting to see!)

Had to setup a windows (10) qemu-kvm guest to interact with some
junkware (which usually tend to become industry standard™️). I figured
it's a good excuse to practice some cross-platform development.

- NRK
Details
Message ID
<m7ld23m2iun56i75os7qwi2zb3h4v4tqvesq2a4f7r7cpgxvy6@p5jqcl23iaoi>
In-Reply-To
<bn4pd5cpb2svdx2j2hzfkdf5xcnkrq3xpvllqfn5h4nlld5ity@c3yuntxgxljm> (view parent)
DKIM signature
missing
Download raw message
> 	typedef struct {
> 		u8 *beg, *end;
> 		enum { FORWARDS, BACKWARDS } perm_direction;
> 	} Arena;
> 	enum { NO_ZERO  =  1 << 0, SCRATCH  =  1 << 1, };

After playing around a bit more with this, it doesn't really seem very
ergonomic to use. For example, if I have a function that duplicates a
string: `str_dup(Arena *perm, Str s)` and the caller wants to tell it to
return on the "scratch" end, he has to flip the `perm_direction` switch,
call str_dup() and then unflip it.

	a->perm_direction = !a->perm_direction;
	str_dup(a, "test");
	a->perm_direction = !a->perm_direction;

Macros can hide some of the work:

	#define TEMP(A, ...) do { /* ... */ } while (0)
	TEMP(a, str_dup(a, "test"));

But I dislike macro magic like these. And it looks ugly too. A more
explicit `flip()` function has less magic and is less ugly:

	Arena *flip(Arena *a) { a->perm_direction = !a->perm_direction; return a; }
	str_dup(flip(a), "test");
	flip(a);

But the caller still needs to manually unflip it. Unnecessary manual
bookkeeping I'd rather avoid.

> But this seems much less of a footgun compared to perm allocation
> invalidating the scratch arena

Going back to your double pointer layout, it seems to be the most
ergonomic design for now. The footgun isn't great, but perhaps it won't
be too big of a deal with some discipline and getting used to.

- NRK
Reply to thread Export thread (mbox)