~skeeto/public-inbox

5 2

Re: Arenas and the almighty concatenation operator

Details
Message ID
<ane2ee7fpnyn3qxslygprmjw2yrvzppxuim25jvf7e6f5jgxbd@p7y6own2j3it>
DKIM signature
pass
Download raw message
Nice trick. Though my problem with this is that this effectively forces you to
have a "string builder" set of utilities that's separate from the buffered IO
layer - despite string building and buffered IO being the same thing - just
differing by the /destination/. E.g: I need to have both of the following:

	concat_int(Arena *a, Str s, intmax_t n);
	out_int(Out *out, intmax_t n);

And so on and so forth for float, hex etc. You can make it so that out*
and concat* both operate only on strings, and then have a T-to-str set
of functions. Though ergonomics suffers in this approach:

	concat(&arena, str, int_to_str(n));
	out(&stdout, int_to_str(n));

Here's an idea that I've had for a long time: it's flexible, but requires a
fair bit of boilerplate as a cost (which is mainly why I haven't used it in any
of my actual projects yet). Instead of defining a `out_flush()` function,
simply embed a flush function pointer into the Out object whos job is to either
make more space or set the err field failing that.

	typedef struct Out {
		u8 *buf, *off, *end; // buffer related vars
		intptr_t dest; // unix fd, windows HANDLE or an `Arena *` as a string-building base
		void (*flush)(struct Out *);
		int err;
	} Out;

In the case of usual buffered IO, the flush pointer would be set to some
function which just flushes the buffer through `os_write` and resets `out->off`
back to the beginning.

In case of string building, `dest` would be an arena pointer and the flush
function would grow the buffer. Most likely case is that you'd want to grow it
transparently - in-place if possible otherwise copy. But the interface is
flexible enough that you can do other stuff too:

* Lock up the entire arena and make the flush function unconditionally call
  abort() if reached.
* Make the flush function allocate with SOFT_FAIL flag set so that it can avoid
  aborting if arena runs out of space and instead set the err field.

... and probably a couple other niche use cases. And the cool part is
that you can have multiple of these flushing strategies in the same
program in parallel, just create different Out objects with different
flush pointer.

But since most of my string building usually happens in close
proximity...

	Out mem = out_memstream(&arena);
	out_str(&mem, a);
	out_int(&mem, n);
	out_str(&mem, S("\n"));
	Str s = out_memstream_finish(&mem, &arena);

... I'm usually not worried about other allocations being interleaved
and can afford to simply lock up the entire arena which is less
boilerplate than using flush function pointer.

- NRK

Re: Arenas and the almighty concatenation operator

Details
Message ID
<2qzyqky3jtv6w64vicwnkrwa7nb52uohuu625bc3zrkaoor6ml@v57pb72uozpy>
In-Reply-To
<ane2ee7fpnyn3qxslygprmjw2yrvzppxuim25jvf7e6f5jgxbd@p7y6own2j3it> (view parent)
DKIM signature
pass
Download raw message
Another thing that I realized is that doing in-place expansion is very
much UB if you're using `malloc + alloc_size` attribute in your
allocator.

	// uses arena_alloc() to allocate the string buffer
	static Str str_dup(Arena *a, Str s);
	
	static Str
	str_cat(Arena *a, Str head, Str tail)
	{
		u8 *hend = (head.len > 0) ? head.s + head.len : NULL;
		if (hend != a->beg) {
			head = str_dup(a, head);
		}
		ASSERT(head.s + head.len == a->beg);
		head.len += str_dup(a, tail).len;
		return head;
	}

As far as the compiler is concerned, the 1st and 2nd str_dup() are two
separate allocations. And C standard is fairly clear that going out of
bound - even if there's guarantee that a valid object exists there - is
UB. From C11 Annex J

| An array subscript is out of range, even if an object is apparently
| accessible with the given subscript (as in the lvalue expression a[1][7]
| given the declaration int a[4][5])

If you don't use malloc+alloc_size attribute in your allocator, then
it doesn't matter. But if you do, then I think you need another function
which has the alloc_size attribute but NOT the malloc attribute (since
it might do in-place expansion) which tells the compiler about the size
change. A very simple way to do it:

	__attribute(( alloc_size(3) ))
	static void *arena_newsize(Arena *a, void *ptr, Size newsize)
	{
		ASSERT((u8 *)ptr + newsize == a->beg);
		return ptr;
	}

The function doesn't do anything, except tell the compiler about the new
size of the object. So you'd stick it in after the expansion:

 	head.len += str_dup(a, tail).len;
+	head.s = arena_newsize(a, head.s, head.len);

I'm not sure if how much of this applies to your C++ code in the
article, but this might be worth mentioning for your C readers.

- NRK

Re: Arenas and the almighty concatenation operator

Details
Message ID
<20240525235643.5auuxy4sgi6hssqx@nullprogram.com>
In-Reply-To
<2qzyqky3jtv6w64vicwnkrwa7nb52uohuu625bc3zrkaoor6ml@v57pb72uozpy> (view parent)
DKIM signature
missing
Download raw message
> a "string builder" set of utilities that's separate from the buffered IO 
> layer

True, and I was sort of assuming there would be some way to reconcile 
these systems behind the scenes with minimal duplication, which is of 
course what you're aiming at in the rest of your message. Appending a 
string to buffered output — which involves accepting maybe a portion, 
flushing, accepting another portion, etc. — is a bit different than plain 
concatenation, so there's not much duplication, but duplicating individual 
type formatters would be a poor solution.

> embed a flush function pointer

You may recall u-config has a system like this. Rather than a full-blown 
function pointer, the "file descriptor" encodes the destination much like 
your idea, and the "virtual" dispatch is a switch on the value. It locks 
the whole arena, like your first option, though it could just as well grow 
the backing buffer on demand, allowing end allocations to proceed.

However, I've moved away from objects retaining arena references, which I 
did quite a bit in u-config. I haven't run into any specific problems, but 
when it's happening, it sticks in the back of my mind that an arena might 
change behind my back if I call another function despite not passing an 
arena. I also have to consider if an object might hold a reference to a 
scratch arena too long — a kind of leaky abstraction.

The situation is more complicated in C++. Consider the copy loop occurring 
during concat. What if T has a copy-assignment operator that modifies the 
arena while concat is in the middle of its own thing? If objects hold onto 
arena references, this could happen. That's the sort of behavior which 
(deservedly!) gives operator overloads a bad rap. Though I wouldn't do 
it, the possibility is a distraction.

If there's a hard rule, "never retain an arena reference after returning" 
and therefore callers must always pass arenas when they're needed, then 
there can be no behind-your-back arena mutation. That means you can't 
write operator overloads that need an arena — which I view as a good 
outcome. A bad outcome is poorer ergonomics for dynamic buffered output. 
The typical case, flush-to-write, doesn't require an arena, but to fit the 
interface you'd always need to pass one. If you "know" it's normal flushed 
output, then you could pass null for the arena, but then is it *really* 
dynamic?

More generally, this is probably a deep, fundamental conflict between 
polymorphism and arena allocation. If any implementation of an interface 
might need an arena, then they all must accept one. In practice today, the 
sorts of people using arenas also shun virtual functions as performance 
killers, so this hardly comes up. But input and output are probably *the* 
most legitimate case of polymorphism — consider how unix file descriptors 
have always been polymorphic — so here we are. I don't know the answer to 
this problem.

The proper solution might require arenas having special treatment at the 
language level. Maybe arenas have a dynamic scope (vs. lexical) whereby 
they're accessible implicitly to callees. Exploring such concepts requires 
language design and implementation — extremely time consuming. (Throw in 
relative pointers and now I've got two things that require core language 
and tooling support…)

> in-place expansion is very much UB if you're using `malloc + alloc_size` 
> attribute in your allocator

Wow, right! That's an excellent point I hadn't considered. What a tricky 
problem. I'll add a note to the article about it.

> I think you need another function which has the alloc_size attribute but 
> NOT the malloc attribute

Right, just like realloc, but are you sure your arena_newsize actually 
resolves it? Normally realloc is in a different translation unit, which 
(aside from LTO) neatly solves it. For us it's in the same translation 
unit, and GCC/Clang can "see" the pointer pass in and out of the function 
unchanged. Is alloc_size enough to sever pointer provenance? I expect it's 
enough for __builtin_object_size, etc., but perhaps insufficient for most 
other cases, like comparisons. I suspect it also needs to launder the 
pointer through an opaque interface, like an assembly block.

    asm ("" : "+r"(ptr));
    return ptr;

It does nothing and should cost nothing, but as far as a compiler is 
considered it's a new pointer, just as if it had passed through a function 
in another translation unit. This isn't a problem for normal allocation 
because the malloc attribute establishes fresh provenance regardless of 
the guts of the function.

Thanks for your feedback, NRK, this has been insightful!

Re: Arenas and the almighty concatenation operator

Details
Message ID
<sl24i5iz2kbx7eqmsbwv6gwplh2jh32k7qcqh6fhmwu25yjqst@oxldmout4ptz>
In-Reply-To
<20240525235643.5auuxy4sgi6hssqx@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
> but are you sure your arena_newsize actually resolves it?

AFAIK, __builtin_object_size() gets it's information from GCC's
optimization passes (hence why you need to enable optimization for
_FORTIFY_SOURCE to work). And so if it's enough for
__builtin_object_size() it should be good enough as far as GCC is
concerned at least.

And with that assumption, I built the following test:

	static int *arena;
	
	__attribute(( malloc, alloc_size(1) ))
	void *new(int size) { return arena++; }
	
	__attribute(( alloc_size(2) ))
	void *extend(void *oldptr, int newsize) { return oldptr; }
	
	int main(void)
	{
		int buf[4];
		arena = buf;
		__asm("" : "+r"(arena));
	
		volatile int *a = new(sizeof *a);
		volatile int *b = new(sizeof *b);
		a[0] = 4;
		/* a = extend(a, sizeof *a + sizeof *b); */
		a[1] = 8;
		return a[1];
	}

If it flags the OOB read and/or write of `a[1]` with the extend()
function commented out but doesn't flag it anymore with it enabled, then
we'd have some sort of an answer.

But unfortunately, the results are disappointing. Neither gcc nor clang
can catch the OOB read nor write, even with the `extend()` function
commented out. Making this test useless.

	[/tmp]~> gcc -g3 -fsanitize=address,undefined -o test test.c -O2 && ./test ; echo $?
	8
	[/tmp]~> gcc -g3 -fsanitize=address,undefined -o test test.c -O0 && ./test ; echo $?
	8
	[/tmp]~> clang -g3 -fsanitize=address,undefined -o test test.c -O2 && ./test ; echo $?
	8
	[/tmp]~> clang -g3 -fsanitize=address,undefined -o test test.c -O0 && ./test ; echo $?
	8

And so it's difficult to say for sure whether arena_newsize() is
effective or not. This probably requires a bit more research.

- NRK

Re: Arenas and the almighty concatenation operator

Details
Message ID
<20240604203552.2tmfbe5w7lgeimrr@nullprogram.com>
In-Reply-To
<sl24i5iz2kbx7eqmsbwv6gwplh2jh32k7qcqh6fhmwu25yjqst@oxldmout4ptz> (view parent)
DKIM signature
missing
Download raw message
I was thinking about this more and contrived a small test confirming that 
pointer laundering is required to avoid UB. However, the situation is 
worse than expected: I found serious alloc_size bugs in both GCC and Clang 
that makes me question using it at all. They would be especially tricky to 
track down when they occur.

I'll go over each of my test program's function one at a time:

__attribute((malloc, alloc_size(1)))
static void *alloc(int)
{
    static int mem[1<<8];
    void *p = mem;
    asm ("" : "+r"(p));  // reset provenance (Clang)
    return p;
}

I avoided malloc, whose attributes might contaminate the results. However, 
that's where I ran into a bug in Clang 17. Without the asm block, it "sees 
through" the interface, ignoring alloc_size, using the size of mem (1KiB) 
rather than the parameter. GCC 14 did not do this and used the alloc_size. 

Clang saw a size too large, which would deoptimize rather than produce a 
broken program, but imagine a more complex control flow. A small object is 
freed, reallocated as part of a larger object, but Clang continues using 
the previous, small size, resulting in a subtly broken program. The result 
also suggests it's ignoring the malloc attribute as well: It identifies 
the returned pointer as being the "mem" object, which of course is not a 
new, distinct object.

Next, a function to resize an object:

__attribute((always_inline, alloc_size(2)))
static void *resize(void *p, int)
{ 
    //asm ("" : "+r"(p));
    return p; 
}

Please note the always_inline. When GCC inlines a function, it discards 
the alloc_size attribute! For "malloc" that just means you lose your 
bounds checks, but inlining "realloc" hoses your whole program! I used 
always_inline so that it gets inlined in debug builds (-Og). The key to 
realizing this bug is that alloc is not inlined but resize is, so that 
only some of the alloc_size information is kept. We'll come back to that 
point in a moment.

As I originally suspected, the asm "fixes" it by breaking provenance. If 
resize is inlined, the alloc_size information is lost, but the asm at 
least makes it discard any previous alloc_size information.

Clang retains alloc_size information through inlining, so it doesn't have 
trouble with this function, and the asm block is superfluous. This is 
despite old size information overriding the attribute in alloc above. It 
seems maybe alloc_size can override previous alloc_size, but not size 
information obtained by other means.

Finally a little test:

int main(void)
{
    int *x = alloc(1*sizeof(int));
    printf("%d\n", (int)__builtin_object_size(x, 0));
    x[0] = 1;
    x = resize(x, 2*sizeof(int));
    printf("%d\n", (int)__builtin_object_size(x, 0));
    x[1] = 2;  // trips sanitizer (GCC)
}

We can peek the compiler's size knowledge, reporting -1 if it does not 
know. With UBSan we also get bounds checking at -Og. Defined exactly as I 
showed, GCC 14 UBSan traps on the indicated second assignment despite the 
updated size information. With the asm uncommented, the second size report 
is -1, disabling the bounds check.

With the first asm block commented out, Clang reports 1024 for the first 
printout, then 8 for the second.

It seems alloc_size, and probably other attributes, are only intended to 
work across translation unit boundaries. Otherwise GCC and Clang each at 
times peek into function definitions and ignore the declared attributes 
that ought to precedence. Only dividing knowledge across translation units 
stops it.

But what about LTO? After all, it's like putting everything back into one 
TU. I was able to reproduce the UBSan false positive with LTO despite 
splitting alloc and resize into their own TU:

$ gcc -O -flto -fsanitize=undefined main.c alloc.c

LTO hates always_inline, so remove it and add noinline to alloc. Starting 
at -O1, GCC inlines resize but not alloc, reproducing the UBSan false 
positive across TUs. In the big picture: If you LTO your runtime (libc, 
libstdc++, etc.), GCC will not reliably compile programs! For those of us 
compiling runtime and application as one TU, we need to be careful about 
GCC attributes. Given the alloc situation with Clang, I wouldn't trust it 
either, though it doesn't appear to be as badly broken.

(This situation nudges me slightly more towards anti-LTO, and I'm further 
convinced disabling it in w64devkit was the right move. There are probably 
tons of subtle LTO-related compiler bugs like this, and LTO users are not 
paying enough attention to notice them.)

Re: Arenas and the almighty concatenation operator

Details
Message ID
<6ycu5cltruheychll5qktvphixizglqch4sskpn6zihagnvmbn@vx27ycjsgsxk>
In-Reply-To
<20240604203552.2tmfbe5w7lgeimrr@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
Welp, that was a depressing read. But I can't say I'm surprised, given
previous experience with optimizers behaving in broken manner on
non-conventional programs (recall the string.h shenanigans).

> When GCC inlines a function, it discards the alloc_size attribute!

Yup, I knew about this and even had
`#ifdef DEBUG __attribute((noinline))` in some of my older projects to
keep the UBSan bounds checking alive in debug builds. But I eventually
stopped using noline and consequently forgot about the bug.

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503

This also explains why my test program didn't work. Going back to it:

* adding `noinline` on both alloc and extend. extend call commented out
  => trips UBSan (good)   test.c:19:7: runtime error: store [...] with insufficient space [...]
* same as above but this time with the extend call uncommented
  => works fine (good)
* `noinline` on alloc, but not extend. extend call uncommented
  => trips UBSan (BAD!!)
* removing all attributes from alloc, along with removing extend entirely
  => works fine (good, but also means no UBSan bounds check)

All the above was with `gcc -O2 -fsanitize=address,undefined`.

> The key to realizing this bug is that alloc is not inlined but resize
> is, so that only some of the alloc_size information is kept.

It didn't occur to me back then, but now that I think about it, this bug
is also probably an issue in your in-place extension part of the
dynamic array article as well.

> For those of us compiling runtime and application as one TU, we need to
> be careful about GCC attributes. Given the alloc situation with Clang, I
> wouldn't trust it

I'm thinking about what the path forward might be.

Simply using `noinline` (always, not only in debug builds) on both alloc
and extend seems to work for now. Clang is smart enough to optimize away
the extend call. gcc still emits a `extend.constprop.0` call that does
nothing.

    https://godbolt.org/z/qPY9adMhe

But it's not really a path that I want to go down on since

A) It can hurt performance. Especially in tight loops where otherwise
   the compiler might've been able to elide away the division in bounds
   check along with unnecessary memory zero-ing if the call was inlined.
B) I'm not even convinced that this is a reliable solution since
   `alloc_size` has shown itself to be a buggy mess.

Simply avoiding doing "in-place" extension could be another option. But
it feels an awful lot like destroying real tangible value from a product
in order to satisfy an unreasonable boss's idiotic demand.

I'm thinking just removing all the attributes might just be the right
way forwards. Downside is you don't get *legitimate* bounds-checking
from UBSan. But for me at least, buffer overflows have stopped being a
problem ever since I've started using sized-strings, so the UBSan bounds
check was providing little value.

- NRK
Reply to thread Export thread (mbox)