~sircmpwn/hare-users

5 2

What's the correct way to allocate `([][]int | void)` on the heap

Details
Message ID
<CADC8Pp7x_DDVQhgiFDRW-YUx_op4CDUQ8ZCiacQ60PxxHCK_dQ@mail.gmail.com>
DKIM signature
pass
Download raw message
I have the following code:

```hare
export fn test_slices() void = {
    //
    // Allocate slice on the heap: [10][]int
    //
    const capacity = 10z;
    let buckets: ([][]int | void) = alloc([], capacity);


    //
    // Create `[]int` instances for every `buckets[x]: []int`
    // `x`: 0 ~ 9
    //
    match (buckets) {
    case void => void;
    case let slice: [][]int =>
        // let s = &(buckets as [][]int): *types::slice;
        let s = &slice: *types::slice;
        fmt::printfln(">>> s (before)\t\t\tlen: {},\t\tcapacity: {},\tdata: {}",
            // len(slice),
            s.length,
            s.capacity,
            s.data)!;

        for(let index=0z; index < capacity; index +=1) {
            append(slice, []);
        };

        fmt::printfln(">>> s (after)\t\t\tlen: {},\tcapacity: {},\tdata: {}",
            // len(slice),
            s.length,
            s.capacity,
            s.data)!;

        // This works
        // fmt::printfln(">>> len(slice[0]): {}", len(slice[0]))!;
    };

    let s = &(buckets as [][]int): *types::slice;
    fmt::printfln(">>> s (outside match)\tlen: {},\t\tcapacity: {},\tdata: {}",
        // len((buckets as [][]int)),
        s.length,
        s.capacity,
        s.data)!;

    // This fail: slice or array access out of bounds
    // fmt::printfln(">>> len((buckets as [][]int)[0]): {}",
len((buckets as [][]int)[0]))!;
};
```

Output:

```bash
>>> s (before)        len: 0,  capacity: 10, data: 0x732ff1cff010
>>> s (after)         len: 10, capacity: 10, data: 0x732ff1cff010
>>> s (outside match) len: 0,  capacity: 10, data: 0x732ff1cff010
```

I suppose that `buckets as [][]int` should be the same slice as `case
let slice: [][]int` and the output proves that they point to the same
`data` (address).

But why `>>> s (outside match) len: 0`??? I suppose that it should be
10, as `append` on the same slice.
Details
Message ID
<D5QV9TAKG7A0.31U7C4XCFKUSP@cmpwn.com>
In-Reply-To
<CADC8Pp7x_DDVQhgiFDRW-YUx_op4CDUQ8ZCiacQ60PxxHCK_dQ@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Wed Nov 20, 2024 at 9:08 AM CET, Ye Wison wrote:
>     match (buckets) {
>     case void => void;
>     case let slice: [][]int =>
>         // let s = &(buckets as [][]int): *types::slice;
>         let s = &slice: *types::slice;

This isn't about the heap, it's about references and values.

"as" copies the inner slice by value and then you take a pointer to the
copy, so you are not updating a reference to the slice stored in
"buckets" when you use append later in this branch.

Consider the following:

use fmt;

export fn main() void = {
	let x: (void | []int) = [];

	// Match with copy (similar to "as")
	match (x) {
	case void => void;
	case let x: []int =>
		append(x, 1);
		append(x, 2);
		append(x, 3);
	};

	// Original is not updated
	assert(len(x as []int) = 0);

	// Match with reference
	match (&x) {
	case void => void;
	case let x: *[]int =>
		append(x, 1);
		append(x, 2);
		append(x, 3);
	};

	// Original is updated
	assert(len(x as []int) = 3);
};
Details
Message ID
<CADC8Pp4vYd-6MEBicn4jYZx8EW-PosXqo4i9v28VMezBa7J0Aw@mail.gmail.com>
In-Reply-To
<D5QV9TAKG7A0.31U7C4XCFKUSP@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
Thanks for the quick reply, that `match(&buckets)` is what I was looking for:

```hare
match (&buckets) {
    case void => void;
    case let slice: *[][]int => ...;
};
```

It solves my current problem, but I still have questions about the hare `slice`:


1. This is what I found in hare source code (hare/stdlib/types/classes.ha):

```hare
type slice = struct {
    // The slice contents.
    data: nullable *opaque,
    // The number of members of the slice.
    length: size,
    // The allocated capacity (in members) of data.
    capacity: size,
};
```

It looks like the regular C dynamic array design, so here is my guess:

`let slice: []u32 = alloc([], 5);` should act like below (pseudo code):

```hare
use rt;
use types;

// Create a slice instance
let slice = types::slice {
    length = 0,
    capacity = 5,
    // Pointer to the heap allocation
    data = rt::malloc(5 * size(u32)),
}

// Copy it by value
let slice_copy = slice;

// This is true, as they're all point to the same heap-allocated memory location
assert((&slice: *types::slice).data == (&slice_copy: *types::slice).data);
```

But I tested with the following:

```hare
// Append to the `slice` not `slice_copy`
append(slice, 0xFF);

print_slice(slice);
print_slice(slice_copy);
```

Output is:

```hare
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255
>>> [ print_slice ] - capacity: 5, length: 0, data:
        Empty
```

That said, `append` only affects the `slice struct instance` itself
(even though both `slice.data` and `slice_copy.data` point to the same
memory)? I guess YES, as `slice.length == 5` after the `append`, but
`slice_copy.length == 0`.


That makes me think "copying slice by value is not safe (or should be
avoided)?", here is the example:

```hare
append(slice, 0xAA);
append(slice_copy, 0xFF); // This should override `0xAA` in `slice.data`

print_slice(slice);
print_slice(slice_copy);
```

The output is:

```bash
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255 // This has been overridden (original should be 0xAA (170))!
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255
```

I'm not complaining, just want to know more about how Hare `slice
(data type)` works, plz point out if my understanding is not correct,
many thanks:)

On Wed, Nov 20, 2024 at 9:35 PM Drew DeVault <sir@cmpwn.com> wrote:
>
> On Wed Nov 20, 2024 at 9:08 AM CET, Ye Wison wrote:
> >     match (buckets) {
> >     case void > void;
> >     case let slice: [][]int >
> >         // let s  &(buckets as [][]int): *types::slice;
> >         let s  &slice: *types::slice;
>
> This isn't about the heap, it's about references and values.
>
> "as" copies the inner slice by value and then you take a pointer to the
> copy, so you are not updating a reference to the slice stored in
> "buckets" when you use append later in this branch.
>
> Consider the following:
>
> use fmt;
>
> export fn main() void  {
>         let x: (void | []int)  [];
>
>         // Match with copy (similar to "as")
>         match (x) {
>         case void > void;
>         case let x: []int >
>                 append(x, 1);
>                 append(x, 2);
>                 append(x, 3);
>         };
>
>         // Original is not updated
>         assert(len(x as []int)  0);
>
>         // Match with reference
>         match (&x) {
>         case void > void;
>         case let x: *[]int >
>                 append(x, 1);
>                 append(x, 2);
>                 append(x, 3);
>         };
>
>         // Original is updated
>         assert(len(x as []int)  3);
> };
Details
Message ID
<D5RCF1501GNN.2BEY4CEB1GAE9@cmpwn.com>
In-Reply-To
<CADC8Pp4vYd-6MEBicn4jYZx8EW-PosXqo4i9v28VMezBa7J0Aw@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
It might help to explain slices in C terms, since you're familiar with
C. The following Hare code:

	let x: []int = alloc([], 5);
	let y: []int = x;

Is similar to the following C code:

	struct int_slice {
		int *data;
		size_t length;
		size_t capacity;
	};

	struct int_slice x = {
		.data = malloc(size(int) * 5),
		.length = 0,
		.capacity = 5,
	};

	struct int_slice y = x;

Then the following Hare code:

	append(x, 1234);

Is equivalent to this C code:

	x.data[x.length++] = 1234;

Naturally this does not cause y.length to change, as y's value was
copied from x -- it's a value, not a reference.

It also follows that if you keep appending until length = capacity,
Hare will essentially call realloc() on the data field. Thus, copying a
slice by value and performing append, insert, or delete on one copy
causes the other copy to hold a stale pointer, which can lead to a
use-after-free bug.

You can take a reference to a slice, and then append/insert/delete the
reference, which will update the original:

	let x: []int = alloc([], 5);
	assert(len(x) = 0);
	let y = &x;
	append(y, 1234);
	assert(len(x) = 1 && x[0] = 1234);

You can also deep copy a slice:

	let x: []int = alloc([], 5);
	append(x, 0xAA);

	let y: []int = alloc(x...);
	append(y, 0xFF);

	assert(x[0] = 0xAA);
	assert(y[0] = 0xAA && y[1] = 0xFF);

But indeed there are several unsafe situations you can construct fairly
easily by misusing slices. Slices are probably Hare's biggest footgun.

Rules to live by:

1. Do not modify a slice with append, insert, or delete so long as there
   is a living reference to one or more of its values.

2. The following code takes a reference to a value in a slice:

   let x: []foo;
   let y = &x[0];

   x cannot be modified so long as y is alive.

   The following code does not take a reference:

   let x: []*foo;
   let y = x[0];

   x may be modified without invalidating y.

3. The following code takes a reference to the values of a slice:

   let x: []int;
   let y = x[3..5];

   x cannot be modified so long as y lives.

4. static append, insert, and delete will never free the data pointer of
   a slice, and that they may be used to work around these constraints
   if appropriate.

Note that by "modifying X", I mean any operation which would change one
or more of the three fields stored in a slice value: the data pointer,
length, and capacity. The operations that modify a slice are append,
insert, delete, and free. Modifying one of the values stored in a slice
is not the same thing as modifying THE slice; this is always safe.
Details
Message ID
<CADC8Pp5hxN3snAWhn=MSL5nFbg8wU5dkd2RKCKk3FPF5nwCh5Q@mail.gmail.com>
In-Reply-To
<D5RCF1501GNN.2BEY4CEB1GAE9@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
Thanks for the detailed explanation with examples, it's super clear, thumbs up:)

Basically, just always pay attention the slice shallow copy (copy
variable by value), as it's not easy spot the following bug (if don't
understand the C dynamic array implementations):

```hare
//
// This is fine, as pre-allocated 10 u32, then `append` won't cause realloc
//
let x: []u32 = alloc([0x11, 0x22, 0x33, 0x44, 0x55, 0x66], 10);
let y = x[3..5];
append(x, 0xFF); // modify `x`
print_slice(x);
print_slice(y);

// >>> [ print_slice ] - capacity: 10, length: 7, data:
//     slice[0]: 0x11
//     slice[1]: 0x22
//     slice[2]: 0x33
//     slice[3]: 0x44
//     slice[4]: 0x55
//     slice[5]: 0x66
//     slice[6]: 0xFF
// >>> [ print_slice ] - capacity: 7, length: 2, data:
//     slice[0]: 0x44
//     slice[1]: 0x55
```


```hare
//
// This is NOT fine, as `append` caused realloc, then `y.data` points
to freed memory
//
let x: []u32 = alloc([0x11, 0x22, 0x33, 0x44, 0x55, 0x66]);
let y = x[3..5];
append(x, 0xFF); // modify `x`
print_slice(x);
print_slice(y);

// >>> [ print_slice ] - capacity: 12, length: 7, data:
//     slice[0]: 0x11
//     slice[1]: 0x22
//     slice[2]: 0x33
//     slice[3]: 0x44
//     slice[4]: 0x55
//     slice[5]: 0x66
//     slice[6]: 0xFF
// >>> [ print_slice ] - capacity: 3, length: 2, data:
//     slice[0]: 0x69696969  // Garbage value!
//     slice[1]: 0x69696969  // ...
```


Also, should always pass a slice reference into a function if you want
to manipulate the original slice:

```hare
//
// Don't do `(ref: []u8)`
//
fn make_change_to_slice(s_ref: *[]u8) void = {
    append(s_ref, 0xFE);
    append(s_ref, 0xFF);
};
```

I think the `match (&x)` should be added to hare tutorial as an
example to explain a bit more about the slice trap, as `[]int` (slice
data type) looks like a primitive magic type in Hare (if `harec`
compiles then it looks ok but it's not run as expected:)

On Thu, Nov 21, 2024 at 11:01 AM Drew DeVault <sir@cmpwn.com> wrote:
>
> It might help to explain slices in C terms, since you're familiar with
> C. The following Hare code:
>
>         let x: []int  alloc([], 5);
>         let y: []int  x;
>
> Is similar to the following C code:
>
>         struct int_slice {
>                 int *data;
>                 size_t length;
>                 size_t capacity;
>         };
>
>         struct int_slice x  {
>                 .data  malloc(size(int) * 5),
>                 .length  0,
>                 .capacity  5,
>         };
>
>         struct int_slice y  x;
>
> Then the following Hare code:
>
>         append(x, 1234);
>
> Is equivalent to this C code:
>
>         x.data[x.length++]  1234;
>
> Naturally this does not cause y.length to change, as y's value was
> copied from x -- it's a value, not a reference.
>
> It also follows that if you keep appending until length  capacity,
> Hare will essentially call realloc() on the data field. Thus, copying a
> slice by value and performing append, insert, or delete on one copy
> causes the other copy to hold a stale pointer, which can lead to a
> use-after-free bug.
>
> You can take a reference to a slice, and then append/insert/delete the
> reference, which will update the original:
>
>         let x: []int  alloc([], 5);
>         assert(len(x)  0);
>         let y  &x;
>         append(y, 1234);
>         assert(len(x)  1 && x[0]  1234);
>
> You can also deep copy a slice:
>
>         let x: []int  alloc([], 5);
>         append(x, 0xAA);
>
>         let y: []int  alloc(x...);
>         append(y, 0xFF);
>
>         assert(x[0]  0xAA);
>         assert(y[0]  0xAA && y[1]  0xFF);
>
> But indeed there are several unsafe situations you can construct fairly
> easily by misusing slices. Slices are probably Hare's biggest footgun.
>
> Rules to live by:
>
> 1. Do not modify a slice with append, insert, or delete so long as there
>    is a living reference to one or more of its values.
>
> 2. The following code takes a reference to a value in a slice:
>
>    let x: []foo;
>    let y  &x[0];
>
>    x cannot be modified so long as y is alive.
>
>    The following code does not take a reference:
>
>    let x: []*foo;
>    let y  x[0];
>
>    x may be modified without invalidating y.
>
> 3. The following code takes a reference to the values of a slice:
>
>    let x: []int;
>    let y  x[3..5];
>
>    x cannot be modified so long as y lives.
>
> 4. static append, insert, and delete will never free the data pointer of
>    a slice, and that they may be used to work around these constraints
>    if appropriate.
>
> Note that by "modifying X", I mean any operation which would change one
> or more of the three fields stored in a slice value: the data pointer,
> length, and capacity. The operations that modify a slice are append,
> insert, delete, and free. Modifying one of the values stored in a slice
> is not the same thing as modifying THE slice; this is always safe.
Details
Message ID
<D5RQQVZ20JX8.3458ULRY5TIU2@cmpwn.com>
In-Reply-To
<CADC8Pp5hxN3snAWhn=MSL5nFbg8wU5dkd2RKCKk3FPF5nwCh5Q@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Thu Nov 21, 2024 at 12:59 AM CET, Ye Wison wrote:
> I think the `match (&x)` should be added to hare tutorial as an
> example to explain a bit more about the slice trap, as `[]int` (slice
> data type) looks like a primitive magic type in Hare (if `harec`
> compiles then it looks ok but it's not run as expected:)

I agree that we should definitely explain slices better in the
tutorial/docs generally.
Reply to thread Export thread (mbox)