~sircmpwn/hare-users

1

Why `alloc` or `rt::malloc` consume so much memory?

Details
Message ID
<CADC8Pp4v2TNvBGE_Jmn4HFq35iRAgdfqgyrv4QJ69wy7PX5OGQ@mail.gmail.com>
DKIM signature
pass
Download raw message
I'm doing a performance test for my protocol parser, I found that the
`alloc` or `rt::malloc` accidentally consume memory, it causes the
program consume starting from 2MB, that's why I wrote the following
test to prove my guess:

```hare
use rt;
use time;

export fn main() void = {

    // 1. Use `alloc`

    // for(let index=0z; index < 1000000; index +=1) {
    //    const temp_mem = alloc(123);
    //    defer free(temp_mem);
    //    time::sleep(time::MILLISECOND * 10);
    // };


    // 2. Use `rt::malloc`

    for(let index=0z; index < 1000000; index +=1) {
       const temp_mem = rt::malloc(size(size));
       defer rt::free_(temp_mem);
       time::sleep(time::MILLISECOND * 10);
    };

};
```

No matter which one I choose, build in release mode, the program
starts from a 2MB footprint.

Then I tried this in C

```c
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main(void) {
    for (size_t index=0; index < 1000000; index++) {
        void *temp_mem = malloc(sizeof(uint32_t));
        usleep(10);
        free(temp_mem);
    }
}
```

It consumes around 400KB.

Could anyone know something about this?:)



Also, here is a side note about what `Hare` performance looks like
based on my test for sharing purposes:

[ Condition ]:

- Same algorithm but run 1,000,000 times, use short lifetime heap allocation
- Release mode: `-R` in hare, `-O3` in C, `ReleaseFast` in Zig

C    -> 19 sec, uses around 390KB
Hare -> 24 sec, uses around 2.2MB
Zig  -> 10 sec, uses around 156KB

The outcome is a little bit surprising, but I still love to use `Hare`
as my C replacement:)
Details
Message ID
<D6GUDMF6WWRD.31T8U2G12VJG8@cmpwn.com>
In-Reply-To
<CADC8Pp4v2TNvBGE_Jmn4HFq35iRAgdfqgyrv4QJ69wy7PX5OGQ@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
Hare's malloc allocates small allocations in batches 2 MiB at a time,
then divides the 2 MiB chunks up into small regions equal to the nearest
power of two for the requested allocation.

Comparing memory allocators on this basis, without itnernal knowledge of
your C library's allocator, for instance, is comparing apples to
oranges. That said, I think I want to spend some time improving the
allocator in the foreseeable future anyway, and these metrics might
change. But they'll still be mostly meaningless imo.
Reply to thread Export thread (mbox)