~skeeto/public-inbox

4 2

Re: Arena allocator tips and tricks

Details
Message ID
<20240126015726.znz6too5hxnuo74z@nullprogram.com>
DKIM signature
missing
Download raw message
Interesting, thanks! So the foundations are there 30 years ago. That goes 
along with virtual memory and plenty of RAM becoming common, after the era 
of cramped "microcomputers." Though perhaps memory wasn't _so_ plentiful 
since it's using a chained memory blocks design.

Last month I saw an early (1988) use of the word "arena" with a similar 
design, though it doesn't mention, as you highlighted in that book, how it 
simplifies the code:

Fast Allocation and Deallocation of Memory Based on Object Lifetimes
https://www.cs.princeton.edu/techreports/1988/191.pdf

> In the very next page it even includes a `NEW` macro

Fascinating how similar it is. It also avoids repeating the type, though 
it can't be used in initialization — which, of course, wasn't happening 
much in the C89 days anyway.

> Probably also illustrates just how much influence over the programmer 
> the defaults of a language has - very few look outside that box.

Great observation! I suspect educational materials play a role, too. They 
all follow the same, conventional route, teaching a tradition rather than 
effective practice derived from experience. Almost none mention debuggers, 
and none ever mention sanitizers. The famous CS50 course has a Makefile 
with "-Wall -Wextra", which puts it above average, but along with omitting 
sanitizers, they don't even enable debugging information.

The "untangling lifetimes" article touches on some of this.

> I was wondering if you had any other highly practical resources on 
> compilers (particularly modern optimizing compilers) like this.

Sorry, I don't. I'm definitely going to check out "A Retargetable C 
Compiler" though!

> After a bit of inspection, I'm currently experimenting with the 
> following two macros instead

"Zero single items and don't initialize arrays" fits my own observations, 
too. I've tried splitting the difference, keeping a single interface which 
always initializes the first element. Feels a little too magical, though.

My current thinking is that rarely can I measure performance cost of zero 
initialization. It's so cheap. I might as well simplify my program by 
eliminating init/noinit as a concern until it makes a real difference.

Re: Arena allocator tips and tricks

Details
Message ID
<csh4gfc5656xriqcxcdkiaq3zmf76c57wqxjpg3hvkh5tl5xdj@bf5a3vnq4vzr>
In-Reply-To
<20240126015726.znz6too5hxnuo74z@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
> Fast Allocation and Deallocation of Memory Based on Object Lifetimes
> https://www.cs.princeton.edu/techreports/1988/191.pdf

I noticed that paper pop up in the C subreddit but I hadn't looked at
it. Skimming through it now, it looks like a carbon copy of the scheme
used in `lcc`! At the end in the acknowledgement section I noticed:

| Chris Fraser's comments helped clarify several sections of this paper.

And then scrolling back up to the top I see the author of the paper is
David R. Hanson. If you now take a look at the authors of "Retargetable
C Compiler" it will become obvious that this wasn't an accident :)

> It also avoids repeating the type, though it can't be used in
> initialization

At first I thought so too, but as ridiculous it may look, the following
works:

	char *p = (p = malloc(4));

Since AFAIK there's a sequence point after an assignment statement, it
should be well defined behavior too. (If not, there's also comma
operator which can force a sequence point, although that would require
evaluating `p` twice).

I might experience around a bit with this macro. Even though I generally
avoid doing nasty things like control-flow or assignments inside of
macros.

- NRK

- - -

(My original mail got rejected by the mailing list due to the png
attachment. Copy pasting it below for archival purposes.)

- - -

I was recently skimming through the book "A Retargetable C
Compiler" and saw a couple interesting things that might
interest you. Instead of beginning with lexer like most other
compiler books, it starts off by introducing the memory
management scheme: different arenas for different lifetimes.
An excerpt from chapter 2 (emphasis added):

| Most implementations of malloc use memory-management algorithms
| that are necessarily based on the sizes of objects. Algorithms
| based on object lifetimes are more efficient - if all of the
| deallocations can be done at once. Indeed, stacklike allocation
| would be most efficient, but it can be used only if object
| lifetimes are nested, which is generally not the case in
| compilers and many other applications.
|
| This chapter describes lcc's storage management scheme, which is
| based on object lifetimes. In this scheme, allocation is more
| efficient than malloc, and the cost of deallocation is
| negligible.
| But the real benefit is that this scheme simplifies the code.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Allocation is so cheap that it encourages simple
| applicative algorithms in place of more space-efficient but
| complex ones. And allocation incurs no deallocation obligation,
| so deallocation can't be forgotten.

In the very next page it even includes a `NEW` macro, very
similar to the one you use (except it takes the pointer being
assigned as argument instead of type):

	#define NEW(p,a) ((p) = allocate(sizeof *(p), (a)))
	#define NEW0(p,a) memset(NEW((p),(a)), 0, sizeof *(p))

Interesting to see how better memory-management techniques have
been known for decades but never really "trickled-down" to the
masses. Probably also illustrates just how much influence over
the programmer the defaults of a language has - very few look
outside that box.

Another thing I find very interesting about this book is that -
unlike most other compiler books which are either academic (all
theory) or builds something toy and not suitable for real-world
usage - this one builds an actual production grade compiler. I
was wondering if you had any other highly practical resources on
compilers (particularly modern optimizing compilers) like this.

P.S: I was also recently experimenting with zero-initializing
allocations by default but noticed a good ton of cases where I was
manually opt-ing out of it. After a bit of inspection, I'm
currently experimenting with the following two macros instead:

	arena_item(Arena *, T)             // allocates a single zero-initialized object of type T
	arena_array(Arena *, T, Size cnt)  // allocates uninitialized array of type T with `cnt` elements

This may seem a bit inconsistent but it's actually consistent
with my usage code where zero-initializing single object
allocation is either preferred or harmless but zero-initializing
array is almost never wanted/required.

Re: Arena allocator tips and tricks

Details
Message ID
<20240127180550.7x2oc7afxhhupuma@nullprogram.com>
In-Reply-To
<csh4gfc5656xriqcxcdkiaq3zmf76c57wqxjpg3hvkh5tl5xdj@bf5a3vnq4vzr> (view parent)
DKIM signature
missing
Download raw message
> it will become obvious that this wasn't an accident :)

Ha, so these aren't so independent as I had thought!

> as ridiculous it may look, the following works

Ah, nice workaround. Your reasoning about its correctness sounds right to 
me. Though while it doesn't repeat the type, it repeats the variable name.

> My original mail got rejected by the mailing list due to the png 
> attachment.

I was trying to figure that out myself before I came to that realization. 
The recent major sourcehut outage was long enough to lose list messages. I 
was able to bounce some back into the list, where they now appear out of 
order, but at least they're there. I thought maybe this was related, so I 
tried to bounce yours, which were accepted by sourcehut SMTP but never 
appeared. Sorry if you got some backscatter from that!

Perhaps it would be nicer if sourcehut SMTP rejected messages with 
disallowed attachments if only because I'd get feedback when bouncing. On 
the other hand, could it explain the rejection well? So that might require 
an asynchronous rejection anyway. It would also be nicer if, on the 
website, messages were ordered by their date header, not by their arrival 
order at sourcehut. (That could be abused, but there are tools to deal 
with abuse.)

Re: Arena allocator tips and tricks

Details
Message ID
<o2cu72n647mxlsrnzhv74uy5ittlhkgmooyqkfuegr5735nly5@p3j4kif2pthf>
In-Reply-To
<csh4gfc5656xriqcxcdkiaq3zmf76c57wqxjpg3hvkh5tl5xdj@bf5a3vnq4vzr> (view parent)
DKIM signature
pass
Download raw message
> I might experiment around a bit with this macro.

Small update, I've tried this out on some scratch projects. Overall, I
don't like it. 2 major issues:

* It's difficult (for me, at least) to read code where assignments are
  being hidden behind macros.
* Doesn't quite work (without casting) when the pointer you want to
  assign to is a void pointer.

> 	char *p = (p = malloc(4));
>
> Since AFAIK there's a sequence point after an assignment statement

Also a small correction, assignments don't have sequence point,
*initializer* does. And so..

	char *p = (p = malloc(4));

..is correct. But

	char *p;
	p = (p = malloc(4));

is not, since it's not an initializer.

- NRK

Re: Arena allocator tips and tricks

Details
Message ID
<20240313235351.pgkqesgctkmakr4c@nullprogram.com>
In-Reply-To
<o2cu72n647mxlsrnzhv74uy5ittlhkgmooyqkfuegr5735nly5@p3j4kif2pthf> (view parent)
DKIM signature
missing
Download raw message
Thanks for the followup!

> It's difficult (for me, at least) to read code where assignments are 
> being hidden behind macros.

Same for me, so I'd probably dislike it for the same reason. Though at 
least it's not a worse case: when variable declarations are hidden inside 
macros.
Reply to thread Export thread (mbox)