I'm doing a performance test for my protocol parser, I found that the
`alloc` or `rt::malloc` accidentally consume memory, it causes the
program consume starting from 2MB, that's why I wrote the following
test to prove my guess:
```hare
use rt;
use time;
export fn main() void = {
// 1. Use `alloc`
// for(let index=0z; index < 1000000; index +=1) {
// const temp_mem = alloc(123);
// defer free(temp_mem);
// time::sleep(time::MILLISECOND * 10);
// };
// 2. Use `rt::malloc`
for(let index=0z; index < 1000000; index +=1) {
const temp_mem = rt::malloc(size(size));
defer rt::free_(temp_mem);
time::sleep(time::MILLISECOND * 10);
};
};
```
No matter which one I choose, build in release mode, the program
starts from a 2MB footprint.
Then I tried this in C
```c
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(void) {
for (size_t index=0; index < 1000000; index++) {
void *temp_mem = malloc(sizeof(uint32_t));
usleep(10);
free(temp_mem);
}
}
```
It consumes around 400KB.
Could anyone know something about this?:)
Also, here is a side note about what `Hare` performance looks like
based on my test for sharing purposes:
[ Condition ]:
- Same algorithm but run 1,000,000 times, use short lifetime heap allocation
- Release mode: `-R` in hare, `-O3` in C, `ReleaseFast` in Zig
C -> 19 sec, uses around 390KB
Hare -> 24 sec, uses around 2.2MB
Zig -> 10 sec, uses around 156KB
The outcome is a little bit surprising, but I still love to use `Hare`
as my C replacement:)
Hare's malloc allocates small allocations in batches 2 MiB at a time,
then divides the 2 MiB chunks up into small regions equal to the nearest
power of two for the requested allocation.
Comparing memory allocators on this basis, without itnernal knowledge of
your C library's allocator, for instance, is comparing apples to
oranges. That said, I think I want to spend some time improving the
allocator in the foreseeable future anyway, and these metrics might
change. But they'll still be mostly meaningless imo.