I have the following code:
```hare
export fn test_slices() void = {
//
// Allocate slice on the heap: [10][]int
//
const capacity = 10z;
let buckets: ([][]int | void) = alloc([], capacity);
//
// Create `[]int` instances for every `buckets[x]: []int`
// `x`: 0 ~ 9
//
match (buckets) {
case void => void;
case let slice: [][]int =>
// let s = &(buckets as [][]int): *types::slice;
let s = &slice: *types::slice;
fmt::printfln(">>> s (before)\t\t\tlen: {},\t\tcapacity: {},\tdata: {}",
// len(slice),
s.length,
s.capacity,
s.data)!;
for(let index=0z; index < capacity; index +=1) {
append(slice, []);
};
fmt::printfln(">>> s (after)\t\t\tlen: {},\tcapacity: {},\tdata: {}",
// len(slice),
s.length,
s.capacity,
s.data)!;
// This works
// fmt::printfln(">>> len(slice[0]): {}", len(slice[0]))!;
};
let s = &(buckets as [][]int): *types::slice;
fmt::printfln(">>> s (outside match)\tlen: {},\t\tcapacity: {},\tdata: {}",
// len((buckets as [][]int)),
s.length,
s.capacity,
s.data)!;
// This fail: slice or array access out of bounds
// fmt::printfln(">>> len((buckets as [][]int)[0]): {}",
len((buckets as [][]int)[0]))!;
};
```
Output:
```bash
>>> s (before) len: 0, capacity: 10, data: 0x732ff1cff010>>> s (after) len: 10, capacity: 10, data: 0x732ff1cff010>>> s (outside match) len: 0, capacity: 10, data: 0x732ff1cff010
```
I suppose that `buckets as [][]int` should be the same slice as `case
let slice: [][]int` and the output proves that they point to the same
`data` (address).
But why `>>> s (outside match) len: 0`??? I suppose that it should be
10, as `append` on the same slice.
On Wed Nov 20, 2024 at 9:08 AM CET, Ye Wison wrote:
> match (buckets) {> case void => void;> case let slice: [][]int =>> // let s = &(buckets as [][]int): *types::slice;> let s = &slice: *types::slice;
This isn't about the heap, it's about references and values.
"as" copies the inner slice by value and then you take a pointer to the
copy, so you are not updating a reference to the slice stored in
"buckets" when you use append later in this branch.
Consider the following:
use fmt;
export fn main() void = {
let x: (void | []int) = [];
// Match with copy (similar to "as")
match (x) {
case void => void;
case let x: []int =>
append(x, 1);
append(x, 2);
append(x, 3);
};
// Original is not updated
assert(len(x as []int) = 0);
// Match with reference
match (&x) {
case void => void;
case let x: *[]int =>
append(x, 1);
append(x, 2);
append(x, 3);
};
// Original is updated
assert(len(x as []int) = 3);
};
Thanks for the quick reply, that `match(&buckets)` is what I was looking for:
```hare
match (&buckets) {
case void => void;
case let slice: *[][]int => ...;
};
```
It solves my current problem, but I still have questions about the hare `slice`:
1. This is what I found in hare source code (hare/stdlib/types/classes.ha):
```hare
type slice = struct {
// The slice contents.
data: nullable *opaque,
// The number of members of the slice.
length: size,
// The allocated capacity (in members) of data.
capacity: size,
};
```
It looks like the regular C dynamic array design, so here is my guess:
`let slice: []u32 = alloc([], 5);` should act like below (pseudo code):
```hare
use rt;
use types;
// Create a slice instance
let slice = types::slice {
length = 0,
capacity = 5,
// Pointer to the heap allocation
data = rt::malloc(5 * size(u32)),
}
// Copy it by value
let slice_copy = slice;
// This is true, as they're all point to the same heap-allocated memory location
assert((&slice: *types::slice).data == (&slice_copy: *types::slice).data);
```
But I tested with the following:
```hare
// Append to the `slice` not `slice_copy`
append(slice, 0xFF);
print_slice(slice);
print_slice(slice_copy);
```
Output is:
```hare
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255
>>> [ print_slice ] - capacity: 5, length: 0, data:
Empty
```
That said, `append` only affects the `slice struct instance` itself
(even though both `slice.data` and `slice_copy.data` point to the same
memory)? I guess YES, as `slice.length == 5` after the `append`, but
`slice_copy.length == 0`.
That makes me think "copying slice by value is not safe (or should be
avoided)?", here is the example:
```hare
append(slice, 0xAA);
append(slice_copy, 0xFF); // This should override `0xAA` in `slice.data`
print_slice(slice);
print_slice(slice_copy);
```
The output is:
```bash
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255 // This has been overridden (original should be 0xAA (170))!
>>> [ print_slice ] - capacity: 5, length: 1, data:
slice[0]: 255
```
I'm not complaining, just want to know more about how Hare `slice
(data type)` works, plz point out if my understanding is not correct,
many thanks:)
On Wed, Nov 20, 2024 at 9:35 PM Drew DeVault <sir@cmpwn.com> wrote:
>> On Wed Nov 20, 2024 at 9:08 AM CET, Ye Wison wrote:> > match (buckets) {> > case void > void;> > case let slice: [][]int >> > // let s &(buckets as [][]int): *types::slice;> > let s &slice: *types::slice;>> This isn't about the heap, it's about references and values.>> "as" copies the inner slice by value and then you take a pointer to the> copy, so you are not updating a reference to the slice stored in> "buckets" when you use append later in this branch.>> Consider the following:>> use fmt;>> export fn main() void {> let x: (void | []int) [];>> // Match with copy (similar to "as")> match (x) {> case void > void;> case let x: []int >> append(x, 1);> append(x, 2);> append(x, 3);> };>> // Original is not updated> assert(len(x as []int) 0);>> // Match with reference> match (&x) {> case void > void;> case let x: *[]int >> append(x, 1);> append(x, 2);> append(x, 3);> };>> // Original is updated> assert(len(x as []int) 3);> };
It might help to explain slices in C terms, since you're familiar with
C. The following Hare code:
let x: []int = alloc([], 5);
let y: []int = x;
Is similar to the following C code:
struct int_slice {
int *data;
size_t length;
size_t capacity;
};
struct int_slice x = {
.data = malloc(size(int) * 5),
.length = 0,
.capacity = 5,
};
struct int_slice y = x;
Then the following Hare code:
append(x, 1234);
Is equivalent to this C code:
x.data[x.length++] = 1234;
Naturally this does not cause y.length to change, as y's value was
copied from x -- it's a value, not a reference.
It also follows that if you keep appending until length = capacity,
Hare will essentially call realloc() on the data field. Thus, copying a
slice by value and performing append, insert, or delete on one copy
causes the other copy to hold a stale pointer, which can lead to a
use-after-free bug.
You can take a reference to a slice, and then append/insert/delete the
reference, which will update the original:
let x: []int = alloc([], 5);
assert(len(x) = 0);
let y = &x;
append(y, 1234);
assert(len(x) = 1 && x[0] = 1234);
You can also deep copy a slice:
let x: []int = alloc([], 5);
append(x, 0xAA);
let y: []int = alloc(x...);
append(y, 0xFF);
assert(x[0] = 0xAA);
assert(y[0] = 0xAA && y[1] = 0xFF);
But indeed there are several unsafe situations you can construct fairly
easily by misusing slices. Slices are probably Hare's biggest footgun.
Rules to live by:
1. Do not modify a slice with append, insert, or delete so long as there
is a living reference to one or more of its values.
2. The following code takes a reference to a value in a slice:
let x: []foo;
let y = &x[0];
x cannot be modified so long as y is alive.
The following code does not take a reference:
let x: []*foo;
let y = x[0];
x may be modified without invalidating y.
3. The following code takes a reference to the values of a slice:
let x: []int;
let y = x[3..5];
x cannot be modified so long as y lives.
4. static append, insert, and delete will never free the data pointer of
a slice, and that they may be used to work around these constraints
if appropriate.
Note that by "modifying X", I mean any operation which would change one
or more of the three fields stored in a slice value: the data pointer,
length, and capacity. The operations that modify a slice are append,
insert, delete, and free. Modifying one of the values stored in a slice
is not the same thing as modifying THE slice; this is always safe.
Thanks for the detailed explanation with examples, it's super clear, thumbs up:)
Basically, just always pay attention the slice shallow copy (copy
variable by value), as it's not easy spot the following bug (if don't
understand the C dynamic array implementations):
```hare
//
// This is fine, as pre-allocated 10 u32, then `append` won't cause realloc
//
let x: []u32 = alloc([0x11, 0x22, 0x33, 0x44, 0x55, 0x66], 10);
let y = x[3..5];
append(x, 0xFF); // modify `x`
print_slice(x);
print_slice(y);
// >>> [ print_slice ] - capacity: 10, length: 7, data:
// slice[0]: 0x11
// slice[1]: 0x22
// slice[2]: 0x33
// slice[3]: 0x44
// slice[4]: 0x55
// slice[5]: 0x66
// slice[6]: 0xFF
// >>> [ print_slice ] - capacity: 7, length: 2, data:
// slice[0]: 0x44
// slice[1]: 0x55
```
```hare
//
// This is NOT fine, as `append` caused realloc, then `y.data` points
to freed memory
//
let x: []u32 = alloc([0x11, 0x22, 0x33, 0x44, 0x55, 0x66]);
let y = x[3..5];
append(x, 0xFF); // modify `x`
print_slice(x);
print_slice(y);
// >>> [ print_slice ] - capacity: 12, length: 7, data:
// slice[0]: 0x11
// slice[1]: 0x22
// slice[2]: 0x33
// slice[3]: 0x44
// slice[4]: 0x55
// slice[5]: 0x66
// slice[6]: 0xFF
// >>> [ print_slice ] - capacity: 3, length: 2, data:
// slice[0]: 0x69696969 // Garbage value!
// slice[1]: 0x69696969 // ...
```
Also, should always pass a slice reference into a function if you want
to manipulate the original slice:
```hare
//
// Don't do `(ref: []u8)`
//
fn make_change_to_slice(s_ref: *[]u8) void = {
append(s_ref, 0xFE);
append(s_ref, 0xFF);
};
```
I think the `match (&x)` should be added to hare tutorial as an
example to explain a bit more about the slice trap, as `[]int` (slice
data type) looks like a primitive magic type in Hare (if `harec`
compiles then it looks ok but it's not run as expected:)
On Thu, Nov 21, 2024 at 11:01 AM Drew DeVault <sir@cmpwn.com> wrote:
>> It might help to explain slices in C terms, since you're familiar with> C. The following Hare code:>> let x: []int alloc([], 5);> let y: []int x;>> Is similar to the following C code:>> struct int_slice {> int *data;> size_t length;> size_t capacity;> };>> struct int_slice x {> .data malloc(size(int) * 5),> .length 0,> .capacity 5,> };>> struct int_slice y x;>> Then the following Hare code:>> append(x, 1234);>> Is equivalent to this C code:>> x.data[x.length++] 1234;>> Naturally this does not cause y.length to change, as y's value was> copied from x -- it's a value, not a reference.>> It also follows that if you keep appending until length capacity,> Hare will essentially call realloc() on the data field. Thus, copying a> slice by value and performing append, insert, or delete on one copy> causes the other copy to hold a stale pointer, which can lead to a> use-after-free bug.>> You can take a reference to a slice, and then append/insert/delete the> reference, which will update the original:>> let x: []int alloc([], 5);> assert(len(x) 0);> let y &x;> append(y, 1234);> assert(len(x) 1 && x[0] 1234);>> You can also deep copy a slice:>> let x: []int alloc([], 5);> append(x, 0xAA);>> let y: []int alloc(x...);> append(y, 0xFF);>> assert(x[0] 0xAA);> assert(y[0] 0xAA && y[1] 0xFF);>> But indeed there are several unsafe situations you can construct fairly> easily by misusing slices. Slices are probably Hare's biggest footgun.>> Rules to live by:>> 1. Do not modify a slice with append, insert, or delete so long as there> is a living reference to one or more of its values.>> 2. The following code takes a reference to a value in a slice:>> let x: []foo;> let y &x[0];>> x cannot be modified so long as y is alive.>> The following code does not take a reference:>> let x: []*foo;> let y x[0];>> x may be modified without invalidating y.>> 3. The following code takes a reference to the values of a slice:>> let x: []int;> let y x[3..5];>> x cannot be modified so long as y lives.>> 4. static append, insert, and delete will never free the data pointer of> a slice, and that they may be used to work around these constraints> if appropriate.>> Note that by "modifying X", I mean any operation which would change one> or more of the three fields stored in a slice value: the data pointer,> length, and capacity. The operations that modify a slice are append,> insert, delete, and free. Modifying one of the values stored in a slice> is not the same thing as modifying THE slice; this is always safe.
On Thu Nov 21, 2024 at 12:59 AM CET, Ye Wison wrote:
> I think the `match (&x)` should be added to hare tutorial as an> example to explain a bit more about the slice trap, as `[]int` (slice> data type) looks like a primitive magic type in Hare (if `harec`> compiles then it looks ok but it's not run as expected:)
I agree that we should definitely explain slices better in the
tutorial/docs generally.