The capacity field in slices is rarely used. The only time it's ever
actually necessary is for append, insert, and non-static delete*.
Everywhere else, the field is a waste of space: eight additional
mostly-unused bytes (on 64-bit targets) per slice really adds up. The
situation is even worse for strings, which have a capacity field which
is literally never used.
*: Technically, the capacity field is only strictly necessary within
static append/insert, since for the non-static variants we could have
the compiler call rt::getmeta on the pointer to obtain the capacity.
However, this would require the implementation to know about
rt::getmeta, and would require the allocator to support it, which is
incompatible with my RFC "Support for user-provided allocators in
rt". Even besides that, calling a function everytime capacity needs
to be obtained is inefficient, given all the safety checks in place
in the current allocator implementation. For this reason, this
proposal assumes that a capacity field is required for all forms of
append, insert, and delete.
I propose we drop the capacity field from slices and strings, and add a
vector type which stores capacity (that is, the vector type would have
the same representation as slices currently do now).
let a: [8]int = [0...];
let b: []int = a[..0]; // slice: stores no data about capacity
let c: [;]int = a[;0]; // vector: length 0, capacity 8
static append(c, 1); // ok
static append(b, 1); // error: vector expected
Vectors are assignable to slices, but slices aren't assignable to
vectors. Vectors and slices are mutually castable.
append, insert, and delete operate on vectors. Also, alloc returns a
vector. Other than that, everything else still works with slices (which
can be assigned from vectors).
Slicing expressions work on arrays, slices, and vectors, and always
return a slice. A new kind of expression, a "vecting expression"
(official name) is added, which also works on arrays, slices, and
vectors, but returns a vector. The syntax is similar to slicing
expressions, but with `;` instead of `..`. Vecting expressions on
vectors set the capacity of the new vector to the capacity of the old
vector minus the lower bound. Vecting expressions on arrays and slices
are similar, but they substitute capacity for length when computing the
new capacity. (This means that `x[;]` is equivalent to `x: [;]t`.)
I'd also like to use this opportunity to add a notation for initializing
a vector to a certain capacity without zero-initializing everything
(i.e. a stack-allocated equivalent to alloc([], x)). A form of this has
been proposed in the past:
https://lists.sr.ht/~sircmpwn/hare-dev/<f15b150a-42bb-380c-e7c7-2849d197e3b1%40posteo.net>
The syntax used in the linked proposal would be unintuitive with vectors
as described in this proposal, since they'd have different behavior than
vecting expressions: in [0 ; 8], 8 is the capacity, whereas in x[0;8], 8
is the length and the capacity remains unchanged. I have two ideas for
syntax that could work better, though I'm open to other suggestions:
1. static alloc([], 8)
2. <8>
The static alloc form allows the initial elements of the vector to be
initialized (and for the length to be non-zero). This... works, but it's
not super great. It's yet another meaning for `static`. I don't hate it
though.
The second form is much more limited: the vector's length is always
initialized to zero. But I honestly kinda like it. The expression within
the angle brackets is used as the capacity of the returned vector.
Initializing the length to zero is almost always what you'd want to do
anyway, so I really don't think this form is that limited in practice.
And besides, you can always static append to it after creating it.
Tenative +1
I'm curious to see a proposal for mandatory error handling on allocation
failure before considering either this or the user-provided allocator
proposal expanded in more detail.
On Thu Aug 31, 2023 at 8:14 AM CEST, Sebastian wrote:
> The capacity field in slices is rarely used. The only time it's ever> actually necessary is for append, insert, and non-static delete*.> Everywhere else, the field is a waste of space: eight additional> mostly-unused bytes (on 64-bit targets) per slice really adds up. The> situation is even worse for strings, which have a capacity field which> is literally never used.
To add to this, the wasted additional 8 bytes are especially harmful when
passing slices by value to functions. A 16 bytes struct would be passed in
registers, while a 24 byte one gets copied on the stack.
And the worst of it is that when passing slices to functions, one absolutely
does not need the cap field and being able to modify the slice with
append/insert/delete just leads to horrible bugs 99% of the time. Currently
slices are views into some memory and also variable lenght containers at the
same time, in my opinion changing that is a good idea worth trying out.
So +1 to the general idea for this reason and for performance benefits, which
I expect to be huge. Figuring out the exact semantics is not going to be
simple however and there are a lot of proposed or at least discussed
changes to the language that are relevant here.
On Wed Sep 6, 2023 at 5:52 PM EDT, Bor Grošelj Simić wrote:
> And the worst of it is that when passing slices to functions, one absolutely> does not need the cap field and being able to modify the slice with> append/insert/delete just leads to horrible bugs 99% of the time.
Most of the bugs come from trying to append/insert/delete to/from a
borrowed slice, which vectors wouldn't solve. Linear types will solve
this.
> So +1 to the general idea for this reason and for performance benefits, which> I expect to be huge. Figuring out the exact semantics is not going to be> simple however and there are a lot of proposed or at least discussed> changes to the language that are relevant here.
Anything specific that hasn't been mentioned yet?
On Wed Sep 6, 2023 at 11:48 PM CEST, Sebastian wrote:
> On Wed Sep 6, 2023 at 5:52 PM EDT, Bor Grošelj Simić wrote:> > And the worst of it is that when passing slices to functions, one absolutely> > does not need the cap field and being able to modify the slice with> > append/insert/delete just leads to horrible bugs 99% of the time.>> Most of the bugs come from trying to append/insert/delete to/from a> borrowed slice, which vectors wouldn't solve. Linear types will solve> this.
It solves the specific case where the borrow happened by passing the
slice-the-variable-length-container into a function by value that took
slice-the-memory-view (which I argue is 99% of functions that take slices).
This is of course not half as good as linear types, but also a much, much
simpler fix and still a big improvement.
> > So +1 to the general idea for this reason and for performance benefits, which> > I expect to be huge. Figuring out the exact semantics is not going to be> > simple however and there are a lot of proposed or at least discussed> > changes to the language that are relevant here.>> Anything specific that hasn't been mentioned yet?
Not very concrete, but everything that needs some special semantics
for slices (because they are a pretty special case among hare's types) will
need to handle two special cases if this is accepted.
> let c: [;]int = a[;0]; // vector: length 0, capacity 8
perhaps better notation:
let c: [;]int = a[..0; 8];
let c: [;]int = a[..0; _]; // equivalent
just use a slicing expression with a capacity afterward, where using '_'
as the capacity sets it to the maximum, somewhat similar to [_] in array
sizes. this underscore syntax is maybe not entirely necessary though,
since here you could just write
let c: [;]int = a[..0; len(a)];
~Autumn
On Thu Sep 14, 2023 at 5:26 AM EDT, Autumn! wrote:
> > let c: [;]int = a[;0]; // vector: length 0, capacity 8> perhaps better notation:>> let c: [;]int = a[..0; 8];> let c: [;]int = a[..0; _]; // equivalent>> just use a slicing expression with a capacity afterward, where using '_' > as the capacity sets it to the maximum, somewhat similar to [_] in array > sizes. this underscore syntax is maybe not entirely necessary though, > since here you could just write>> let c: [;]int = a[..0; len(a)];
What behavior would slicing expressions have on vectors? Would it yield
another vector, or would you still need to use your proposed notation
everytime you want to vect something?
On 9/14/23 20:50, Sebastian wrote:
> On Thu Sep 14, 2023 at 5:26 AM EDT, Autumn! wrote:>>> let c: [;]int = a[;0]; // vector: length 0, capacity 8>> perhaps better notation:>>>> let c: [;]int = a[..0; 8];>> let c: [;]int = a[..0; _]; // equivalent>>>> just use a slicing expression with a capacity afterward, where using '_'>> as the capacity sets it to the maximum, somewhat similar to [_] in array>> sizes. this underscore syntax is maybe not entirely necessary though,>> since here you could just write>>>> let c: [;]int = a[..0; len(a)];> > What behavior would slicing expressions have on vectors? Would it yield> another vector, or would you still need to use your proposed notation> everytime you want to vect something?
i was thinking you'd use vecting notation to produce a vector and
slicing notation to produce a slice, in every case.