~sircmpwn/hare-rfc

11 7

[RFC v1] Addition of a vector type

Details
Message ID
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop>
DKIM signature
missing
Download raw message
The capacity field in slices is rarely used. The only time it's ever
actually necessary is for append, insert, and non-static delete*.
Everywhere else, the field is a waste of space: eight additional
mostly-unused bytes (on 64-bit targets) per slice really adds up. The
situation is even worse for strings, which have a capacity field which
is literally never used.

*: Technically, the capacity field is only strictly necessary within
   static append/insert, since for the non-static variants we could have
   the compiler call rt::getmeta on the pointer to obtain the capacity.
   However, this would require the implementation to know about
   rt::getmeta, and would require the allocator to support it, which is
   incompatible with my RFC "Support for user-provided allocators in
   rt". Even besides that, calling a function everytime capacity needs
   to be obtained is inefficient, given all the safety checks in place
   in the current allocator implementation. For this reason, this
   proposal assumes that a capacity field is required for all forms of
   append, insert, and delete.

I propose we drop the capacity field from slices and strings, and add a
vector type which stores capacity (that is, the vector type would have
the same representation as slices currently do now).

	let a: [8]int = [0...];
	let b: []int = a[..0]; // slice: stores no data about capacity
	let c: [;]int = a[;0]; // vector: length 0, capacity 8

	static append(c, 1); // ok
	static append(b, 1); // error: vector expected

Vectors are assignable to slices, but slices aren't assignable to
vectors. Vectors and slices are mutually castable.

append, insert, and delete operate on vectors. Also, alloc returns a
vector. Other than that, everything else still works with slices (which
can be assigned from vectors).

Slicing expressions work on arrays, slices, and vectors, and always
return a slice. A new kind of expression, a "vecting expression"
(official name) is added, which also works on arrays, slices, and
vectors, but returns a vector. The syntax is similar to slicing
expressions, but with `;` instead of `..`. Vecting expressions on
vectors set the capacity of the new vector to the capacity of the old
vector minus the lower bound. Vecting expressions on arrays and slices
are similar, but they substitute capacity for length when computing the
new capacity. (This means that `x[;]` is equivalent to `x: [;]t`.)

I'd also like to use this opportunity to add a notation for initializing
a vector to a certain capacity without zero-initializing everything
(i.e. a stack-allocated equivalent to alloc([], x)). A form of this has
been proposed in the past:
https://lists.sr.ht/~sircmpwn/hare-dev/<f15b150a-42bb-380c-e7c7-2849d197e3b1%40posteo.net>

The syntax used in the linked proposal would be unintuitive with vectors
as described in this proposal, since they'd have different behavior than
vecting expressions: in [0 ; 8], 8 is the capacity, whereas in x[0;8], 8
is the length and the capacity remains unchanged. I have two ideas for
syntax that could work better, though I'm open to other suggestions:

1. static alloc([], 8)
2. <8>

The static alloc form allows the initial elements of the vector to be
initialized (and for the length to be non-zero). This... works, but it's
not super great. It's yet another meaning for `static`. I don't hate it
though.

The second form is much more limited: the vector's length is always
initialized to zero. But I honestly kinda like it. The expression within
the angle brackets is used as the capacity of the returned vector.
Initializing the length to zero is almost always what you'd want to do
anyway, so I really don't think this form is that limited in practice.
And besides, you can always static append to it after creating it.
Details
Message ID
<CV6LLW4Y1FHB.1K5J9TVG9TX59@taiga>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
Tenative +1

I'm curious to see a proposal for mandatory error handling on allocation
failure before considering either this or the user-provided allocator
proposal expanded in more detail.
Details
Message ID
<CVACLW6DB1LE.9O9DSRPXG3OL@monch>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
not sure i love any of the syntax proposed here, but +1 to the semantics
Details
Message ID
<CVC61LZW3B1Y.206QQ7CNDKG0H@attila>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
On Thu Aug 31, 2023 at 8:14 AM CEST, Sebastian wrote:
> The capacity field in slices is rarely used. The only time it's ever
> actually necessary is for append, insert, and non-static delete*.
> Everywhere else, the field is a waste of space: eight additional
> mostly-unused bytes (on 64-bit targets) per slice really adds up. The
> situation is even worse for strings, which have a capacity field which
> is literally never used.

To add to this, the wasted additional 8 bytes are especially harmful when
passing slices by value to functions. A 16 bytes struct would be passed in
registers, while a 24 byte one gets copied on the stack.

And the worst of it is that when passing slices to functions, one absolutely
does not need the cap field and being able to modify the slice with
append/insert/delete just leads to horrible bugs 99% of the time. Currently
slices are views into some memory and also variable lenght containers at the
same time, in my opinion changing that is a good idea worth trying out.

So +1 to the general idea for this reason and for performance benefits, which
I expect to be huge. Figuring out the exact semantics is not going to be
simple however and there are a lot of proposed or at least discussed
changes to the language that are relevant here.
Details
Message ID
<CVC5Z3XC12HH.1W1GDE8O3LPV8@notmylaptop>
In-Reply-To
<CVC61LZW3B1Y.206QQ7CNDKG0H@attila> (view parent)
DKIM signature
missing
Download raw message
On Wed Sep 6, 2023 at 5:52 PM EDT, Bor Grošelj Simić wrote:
> And the worst of it is that when passing slices to functions, one absolutely
> does not need the cap field and being able to modify the slice with
> append/insert/delete just leads to horrible bugs 99% of the time.

Most of the bugs come from trying to append/insert/delete to/from a
borrowed slice, which vectors wouldn't solve. Linear types will solve
this.

> So +1 to the general idea for this reason and for performance benefits, which
> I expect to be huge. Figuring out the exact semantics is not going to be
> simple however and there are a lot of proposed or at least discussed
> changes to the language that are relevant here.

Anything specific that hasn't been mentioned yet?
Details
Message ID
<CVC6XPTH2O7P.1M98WC2DK2K8O@attila>
In-Reply-To
<CVC5Z3XC12HH.1W1GDE8O3LPV8@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
On Wed Sep 6, 2023 at 11:48 PM CEST, Sebastian wrote:
> On Wed Sep 6, 2023 at 5:52 PM EDT, Bor Grošelj Simić wrote:
> > And the worst of it is that when passing slices to functions, one absolutely
> > does not need the cap field and being able to modify the slice with
> > append/insert/delete just leads to horrible bugs 99% of the time.
>
> Most of the bugs come from trying to append/insert/delete to/from a
> borrowed slice, which vectors wouldn't solve. Linear types will solve
> this.

It solves the specific case where the borrow happened by passing the
slice-the-variable-length-container into a function by value that took
slice-the-memory-view (which I argue is 99% of functions that take slices).
This is of course not half as good as linear types, but also a much, much
simpler fix and still a big improvement.

> > So +1 to the general idea for this reason and for performance benefits, which
> > I expect to be huge. Figuring out the exact semantics is not going to be
> > simple however and there are a lot of proposed or at least discussed
> > changes to the language that are relevant here.
>
> Anything specific that hasn't been mentioned yet?

Not very concrete, but everything that needs some special semantics
for slices (because they are a pretty special case among hare's types) will
need to handle two special cases if this is accepted.
Details
Message ID
<881aa545-26cf-46b8-ae12-9652e43041b5@posteo.net>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
> 	let c: [;]int = a[;0]; // vector: length 0, capacity 8
perhaps better notation:

	let c: [;]int = a[..0; 8];
	let c: [;]int = a[..0; _]; // equivalent

just use a slicing expression with a capacity afterward, where using '_' 
as the capacity sets it to the maximum, somewhat similar to [_] in array 
sizes. this underscore syntax is maybe not entirely necessary though, 
since here you could just write

	let c: [;]int = a[..0; len(a)];

~Autumn
Details
Message ID
<CVIWGOWJDRRZ.2JFFI6PA85GUN@notmylaptop>
In-Reply-To
<881aa545-26cf-46b8-ae12-9652e43041b5@posteo.net> (view parent)
DKIM signature
missing
Download raw message
On Thu Sep 14, 2023 at 5:26 AM EDT, Autumn! wrote:
> > 	let c: [;]int = a[;0]; // vector: length 0, capacity 8
> perhaps better notation:
>
> 	let c: [;]int = a[..0; 8];
> 	let c: [;]int = a[..0; _]; // equivalent
>
> just use a slicing expression with a capacity afterward, where using '_' 
> as the capacity sets it to the maximum, somewhat similar to [_] in array 
> sizes. this underscore syntax is maybe not entirely necessary though, 
> since here you could just write
>
> 	let c: [;]int = a[..0; len(a)];

What behavior would slicing expressions have on vectors? Would it yield
another vector, or would you still need to use your proposed notation
everytime you want to vect something?
Details
Message ID
<736ba0f0-784c-4582-b4be-ea8ca4962005@posteo.net>
In-Reply-To
<CVIWGOWJDRRZ.2JFFI6PA85GUN@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
On 9/14/23 20:50, Sebastian wrote:
> On Thu Sep 14, 2023 at 5:26 AM EDT, Autumn! wrote:
>>> 	let c: [;]int = a[;0]; // vector: length 0, capacity 8
>> perhaps better notation:
>>
>> 	let c: [;]int = a[..0; 8];
>> 	let c: [;]int = a[..0; _]; // equivalent
>>
>> just use a slicing expression with a capacity afterward, where using '_'
>> as the capacity sets it to the maximum, somewhat similar to [_] in array
>> sizes. this underscore syntax is maybe not entirely necessary though,
>> since here you could just write
>>
>> 	let c: [;]int = a[..0; len(a)];
> 
> What behavior would slicing expressions have on vectors? Would it yield
> another vector, or would you still need to use your proposed notation
> everytime you want to vect something?
i was thinking you'd use vecting notation to produce a vector and 
slicing notation to produce a slice, in every case.
Details
Message ID
<CVSNPOR54FAZ.32RRYI95OOVPJ@ki>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
missing
Download raw message
This shouldn't be called “vector”. That name comes from bad C++
terminology that we shouldn't further perpetuate. A vector is tensor of
rank one, i.e. the vector one would use in linear algebra and 3D
graphics. This RFC is describing a dynamic array [0][1][2].

[0]: https://en.wikipedia.org/wiki/Dynamic_array
[1]: https://en.wikipedia.org/wiki/Dynamic_array#Language_support
[2]: https://odin-lang.org/docs/overview/#dynamic-arrays
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZmF0T5dXnkg8f1Wv@xha.li>
In-Reply-To
<CV6F1MP0FTN4.8TYFI8WQBTM6@notmylaptop> (view parent)
DKIM signature
pass
Download raw message
hi,

sorry for replying to this old thread. spxtr recently brought
it up in #hare-dev and i thought we could revive this thread,
because i think that the idea is very interesting.

here's my take: i think that we should not introduce a vector 
(dynamic array) type in hare. i think that the biggest problem with 
it is that we have to introduce a lot of special cases for them (as 
already brought up in this thread), that are not exactly helpful 
nor easy/intuitive to solve. furthermore, the syntax is a very 
challenging problem to solve, and i am not even sure if it's possible
within hare's style.

i think that the performance problem is also over-hated; sure, it's
not possible to pass the values in registers, but instructions like
pushq that put stuff on the stack are heavily optimized on CPU's
these days and let's be honest: if you have a performance problem
because of this, you might aswell re-write this piece of code in
assembly instead. something like "inline" or "#[inline]" might also
work here.

my point is that slices are nowhere near perfect and i don't think
that we should treat them as pefect or try to make them perfect;
they're inherently flawed because of their many uses. but i think
that they are still very much worth it.

instead, i propse the following:

- communicate way more clearly in tutorials and stuff that slices
  are really just "struct { ptr, sz, cap }" that are treated specially
  in the language. even though this doesn't fix the fact that you
  can still append() to non-heap allocated slices, i think this is
  fine: it will never work at runtime anyways, and when we communicate
  more clearly how slices work, it will make sense.

- remove the "cap" field from strings; i think that this is fine.

- somewhat overhaul "slice -> array" and "array -> slice" assignments
  in a different RFC. specifically, maby we can ensure that only pointers
  to arrays are assignable to slices, because that is also how it works
  internally. for example, only *[4]int would be assignable to []int,
  [4]int could not be assigned to []int. there are a few other issues
  that i'd like to fix aswell. ticket for `slices -> arrays` is here:
  https://todo.sr.ht/~sircmpwn/hare/719

thoughts?
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZmHA3R-gWydtShVB@xha.li>
In-Reply-To
<ZmF0T5dXnkg8f1Wv@xha.li> (view parent)
DKIM signature
pass
Download raw message
after a discussion with yyp in #hare-dev, i am still -1, but i would
be more interested if there was a good syntax (which, again, i don't
think there is). however, there are still lot's of problems with
dynamic arrays (+ the mental overhead), which make it not worth for
me.

i would like to add the following points though:

- improve the syntax for specifiying the capacity of a slice,
  i.e. beeing able to do [..<array>; <capacity>] or something
- maby we can improve the alloc() syntax?
Reply to thread Export thread (mbox)