~skeeto/public-inbox

2 2

Re: A simple, arena-backed, generic dynamic array for C

Details
Message ID
<t4rhnewm3ukxy2eno6t6h4iggfaeuy62i622m62pym5ajqgbin@q56m3ei27efm>
DKIM signature
missing
Download raw message
Hi again,

I'm wondering how you decide between this array and a linked-list. The memory waste seems to be quite significant and an arena based linked-list should be much more efficient?

Or could we use some realloc() method like I showed in my previous message? Then pushing new elements to an array could just grow the existing arena space if nothing else was allocated in between?

Thanks again,
Dennis

-- 
Dennis Schön
https://www.dennis-schoen.de

Re: A simple, arena-backed, generic dynamic array for C

Details
Message ID
<AM8P192MB0932AA1E5A328559CC0C10CEA4C9A@AM8P192MB0932.EURP192.PROD.OUTLOOK.COM>
In-Reply-To
<t4rhnewm3ukxy2eno6t6h4iggfaeuy62i622m62pym5ajqgbin@q56m3ei27efm> (view parent)
DKIM signature
missing
Download raw message
Hi,

>I'm wondering how you decide between this array and a linked-list. The
>memory waste seems to be quite significant and an arena based
>linked-list should be much more efficient?

In my opinion it depends on how exactly you use the data later which
data structure is more efficient, but just regarding the memory
overhead it's worth keeping in mind that linked lists always have a
per-node overhead.

With a dynamic array that grows by a factor of 2, the memory overhead
is also a factor of 2, because all previous sizes are still around. If
the elements are 32bit ints as in the example, then that's probably not
really a big deal.  If you have enough memory for 128 ints, you
probably have enough for 255.  And you can also reduce the overhead
introduced by growing very large arrays by reserving a lot of memory up
front if you know that they're going to be very large.

With a linked list, the memory overhead is 1 pointer per node if it's
singly linked, and 2 pointers per node if it's doubly linked. If you
store just one pointer and one 32bit int per list node that overhead is
actually larger than that of the growable array. (Though the
linked list would win if the element type was significantly larger).
To amortize that overhead you'd have to unroll the linked list and store
multiple elements per node.
So you'd do something like
struct IntNode {
  struct IntNode *next;
  int n;
  int ints[61]; // can choose some number of elements to make nodes
                // nicely aligned on cache lines or maybe even make
                // this a flexible member and size it dynamically.
};
struct IntList {
  struct IntNode *first, *last;
};
So to push you'd add the element in the last node and to iterate you'd
start at the first node.
This would reduce the relative overhead of the pointers and it would
make the access to the ints slightly more cache friendly too.

A big advantage of linked lists is also that they just need to push new
nodes in order to grow and never need to memcpy existing elements.

I still think simple growable arrays are more practical to use in many
cases though.  You give up some things by using a node based
structure.  Arrays are easier to iterate, to access by position, to
memcopy, to sort, to pass around as pointers, etc.

Cheers,
Leo

Re: A simple, arena-backed, generic dynamic array for C

Details
Message ID
<20231006152431.othtvfczcnkx5cv5@nullprogram.com>
In-Reply-To
<t4rhnewm3ukxy2eno6t6h4iggfaeuy62i622m62pym5ajqgbin@q56m3ei27efm> (view parent)
DKIM signature
missing
Download raw message
If I know the number of items ahead of time, I allocate an array at that 
size, which is what the "count" parameter is all about.

If I don't trivially know that number, and I merely want to accumulate 
items without preserving order, then I use a linked list with new items 
pushed to the front. The usual caveats about cache misses don't apply 
because items are are nicely ordered and localized in the arena, and the 
list is short-lived so it won't have time to accumulate entropy that would 
break this property. By "trivially" I mean that often it can be done with 
two passes, the first to count without collecting, and another to actually 
collect into an array, but that probably doesn't count as trivial.

If order matters, I use a tail double pointer to push at the end, as 
demonstrated in the first article. Nearly as simple as pushing to the 
front.

If I need random access, or an array representation is essential, then 
dynamic arrays come into play. OpenGL is realistic case which is why I 
picked it. Though otherwise I had trouble coming up with examples where 
it's actually necessary. Conventionally, dynamic arrays are the workhorse 
for value accumulation, but that's because it's backed by general purpose 
allocation. For arenas, linked lists are more natural. In my experience I 
need hash maps more often than dynamic arrays.

When they can't extend in place (your realloc), arena dynamic arrays 
average 100% overhead. Linked lists have one pointer of overhead per 
element. Break even is when elements are pointer-sized, larger (e.g. str) 
favors lists, and smaller (e.g. bytes, floats) favors dynamic arrays, 
though I wouldn't have used linked lists in the small case anyway.

Your realloc idea is the best of both when it applies, i.e. collected 
items don't have their own allocations. If I rolled this special case to 
my grow() function, that would probably push me towards dynamic arrays for 
accumulation by default. I'll need to try it out, so thanks!
Reply to thread Export thread (mbox)