~sircmpwn/hare-rfc

19 5

[RFC v1] @align for globals and pointers

Details
Message ID
<D40FCD20TA1I.26YR2XMYF58R0@sebsite.pw>
DKIM signature
pass
Download raw message
                              RFC SUMMARY

I propose we add an @align attribute, which will apply in two places:

- Globals: @align(32) let foo = bar { ... };
- Pointers: *@align(8) int

For globals, @align is analogous to alignas in C. The alignment must be
a power of two greater than or equal to the alignment of the type. The
maximum supported alignment is implementation-defined. For pointers,
@align acts as a form of type safety. The alignment can be any power of
two, and it defaults to the alignment of the secondary type if omitted.

                         LANGUAGE IMPLICATIONS

As mentioned above, @align is added for globals. When taking the address
of a global, the pointer alignment is set to that of the global:

	let x = 0u32;
	@align(8) let y = 0u32;
	let xptr = &x; // -> *u32, or equivalently, *@align(4) u32
	let yptr = &y; // -> *@align(8) u32

We could also allow @align on local bindings:

	static @align(8) let x = 0u32; // statically allocated
	@align(8) let y = 0u32; // stack allocated

To see why @align on pointers is desirable, take the following code:

	// use crypto::random;
	let buf: [size(u64)]u8 = [0...];
	random::buffer(buf);
	const n = *(&buf: *u64);

At a glance, this code looks perfectly fine. Code similar to this is
used multiple times in the standard library. But there's a subtle nasty
bug: buf may not meet the alignment requirements for u64. So when
casting to *u64, the pointer may be misaligned! This doesn't cause
issues on our currently supported targets (besides poor performance),
but on some targets this could cause a crash, or potentially worse. We
currently provide no safeguards against bugs caused by pointer
misalignment.

Introducing @align on pointer types would fix this issue. The
assignability and castability rules for pointers are adjusted as
follows:

- A pointer type is assignable to a pointer type of the same secondary
  type if the new pointer alignment is less than or equal to the old
  alignment
- A pointer cast can only increase the alignment if the secondary type
  doesn't change

For example:

- `*u64` is assignable to `*@align(4) u64`, but not to `*@align(16) u64`
- `*u8` is castable to `*@align(4) u8`, but not to `*@align(4) u16` or `*u32`
- `*@align(4) u8` is castable to `*u16` and `*u32`, but not to `*u64`
- `*u32` is still castable to `*u8` and `*u16`

So the random::buffer example now refuses to compile, because
`*[size(u64)]u8` isn't castable to `*u64`, because the alignment
increases from 1 to 8. If you wanted to keep the same behavior, you
could rewrite the cast like this:

	const n = *(&buf: *@align(8) [8]u8: *u64);

This is very verbose and cumbersome, but that's a good thing, because
the actual correct thing to do is to rewrite the code so the alignment
is correct to begin with:

	let n = 0u64;
	random::buffer(&n: *[size(u64)]u8);

There's a couple complications that arise which necessitate special
cases:

1. If @align is omitted, and the pointer's secondary type has undefined
   alignment, then the pointer also has undefined alignment. Such
   pointers can be cast/assigned to/from other pointers without any
   regard to pointer alignment. For example, `*opaque` is castable to
   any other pointer, and all pointers are assignable to `*opaque`. I
   don't think there's any way around this unfortunately.

2. When taking the address of a misaligned struct/union (either because
   of an unaligned @offset or a @packed struct), the alignment is always
   1. This also applies to nested access expressions. For example:

	type t = struct { @offset(6) x: struct { y: [2]u32 } };
	let foo = t { ... };
	let bar = &foo.x.y[1]; // -> *@align(1) u32

   Another option is to set the alignment to the largest power of two
   multiple of the offset, so in the above example the alignment would
   be 2. I prefer always having it be 1, but either-or is fine.

When `alloc` is provided a pointer type hint with a custom alignment,
the allocated storage has (at least) that alignment. The maximum
supported alignment is implementation-defined. No matter what, the
alignment is guaranteed to be greater than or equal to the largest
definite type alignment (currently 8, but possibly 16 in the future if
we add 128-bit types). The alignment of the returned pointer is always
at least this value. So the following code *is* correct, and will
continue to compile:

	let buf = alloc([0...]: [size(u64)]u8)!; // -> *@align(8) [size(u64)]u8
	random::buffer(buf);
	const n = *(buf: *u64);

I'm not bothering with @align on slice types in this RFC, since the
semantics are unintuitive: `[]@align(8) u32` would mean that the backing
array has the alignment 8 (i.e. its type is `*@align(8) [*]u32`), *not*
that the elements all do.

                     STANDARD LIBRARY IMPLICATIONS

None

                         ECOSYSTEM IMPLICATIONS

This is technically a breaking change, but I don't think there should be
any automated migration procedure, since if your code stops compiling
after this change, your code was likely incorrect to begin with.

                              EXAMPLE CODE

In addition to the random::buffer example given above, here's an example
from net/ip/+linux.ha:

	return rt::sockaddr {
		in = rt::sockaddr_in {
			sin_family = rt::AF_INET,
			sin_port = endian::htonu16(port),
			sin_addr = rt::in_addr { s_addr = *(&v4[0]: *opaque: *u32) },
			...
		},
		...
	};

v4's type is addr4, which is an alias of [4]u8, so the above code is
incorrect. If this RFC is implemented, the bug would be diagnosed, and
the code could be rewritten like this:

	const addr = rt::in_addr {
		s_addr = endian::host.getu32(v4),
	};
	return rt::sockaddr {
		in = rt::sockaddr_in {
			sin_family = rt::AF_INET,
			sin_port = endian::htonu16(port),
			sin_addr = addr,
			...
		},
		...
	};

                              RELATED RFCS

None

Prior art:
- C: alignas
- Zig: https://ziglang.org/documentation/master/#Alignment
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZuEDKAWvqM_Qoxo9@xha.li>
In-Reply-To
<D40FCD20TA1I.26YR2XMYF58R0@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
hmm, generally i think that some way of specifiying the alignment is a very
good idea. i also like your ideas.

but isn't this a bit too complex than it needs to be? what
if we make @align() an attribute for types, wouldn't that be much simpler
because it removes the special case in the syntax everywhere, although that
would be less flexible. however, i don't imagine you need it that often.

type int_align8 = @align(8) int;

seems fine for me. also given that a type already has an implicit
alignment, this would allow users to set it explicitly, not chagning
much really.

either way, i think that the assignability rules are Good.
Details
Message ID
<D438922YTYTC.2FU5T0S2MJ3HG@sebsite.pw>
In-Reply-To
<ZuEDKAWvqM_Qoxo9@xha.li> (view parent)
DKIM signature
pass
Download raw message
On Tue Sep 10, 2024 at 10:40 PM EDT, Lorenz (xha) wrote:
> but isn't this a bit too complex than it needs to be? what
> if we make @align() an attribute for types, wouldn't that be much simpler
> because it removes the special case in the syntax everywhere, although that
> would be less flexible. however, i don't imagine you need it that often.
>
> type int_align8 = @align(8) int;
>
> seems fine for me. also given that a type already has an implicit
> alignment, this would allow users to set it explicitly, not chagning
> much really.

This doesn't make sense IMO, for a few reasons:

- We assume the alignment of a type doesn't exceed its size. Breaking
  this assumption would cause weird behavior, like padding in arrays.
- Actually, in general, having padding for non-aggregate types is,
  really weird.
- Stuff is passed by value, so either alignment would be discarded on
  assignment, or it would be viral, and neither make sense IMO. It's
  similar to the issue we currently have with const (and type flags in
  general), and also similar to type qualifiers in C, which have weird
  semantics.
- Adding onto the last point: if `foo` has type `*@align(8) int`, what's
  the type of `*foo`? With my proposal, it's `int`, but if @align were
  generalized to all types then this would become unclear.
- We'd need to special case types with zero or undefined size. The
  latter special case (undefined size) is necessary for pointers as
  well, but the former (zero size) isn't.

I also can't think of any reason you'd use @align on a type, so, it
feels unnecessary to me.

I don't understand what you mean by my proposal being "more complex than
it needs to be". Your proposal is more complex than mine is. It moves
the @align syntax from pointers to types, which is a net-zero change in
"complexity", but it also necessitates implementing the semantics for
@align on types outside of pointers.

(As an aside, I don't think words like "complexity"/"simple" are very
descriptive, especially here. They read like buzzwords to me, especially
because everyone has a different idea on what "complex" means so it
leaves lots of room for miscommunication.)
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZuGkHfaI17E9Qa69@xha.li>
In-Reply-To
<D438922YTYTC.2FU5T0S2MJ3HG@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
what i was trying to say is that types already have an alignment
and that, when adding @align just for types, users can change the
already present, but implicit, alignment. in my opionion, alignment
is more of a property of types, and less of a property of the
pointer. more comments inline.

On Wed, Sep 11, 2024 at 02:05:42AM -0400, Sebastian wrote:
> On Tue Sep 10, 2024 at 10:40 PM EDT, Lorenz (xha) wrote:
> > but isn't this a bit too complex than it needs to be? what
> > if we make @align() an attribute for types, wouldn't that be much simpler
> > because it removes the special case in the syntax everywhere, although that
> > would be less flexible. however, i don't imagine you need it that often.
> >
> > type int_align8 = @align(8) int;
> >
> > seems fine for me. also given that a type already has an implicit
> > alignment, this would allow users to set it explicitly, not chagning
> > much really.
> 
> This doesn't make sense IMO, for a few reasons:
> 
> - We assume the alignment of a type doesn't exceed its size. Breaking
>   this assumption would cause weird behavior, like padding in arrays.

yes, but as a trade-off this would solve the issue of wanting an explicit
alignment of the items in a slice-or-array type, for example. aren't slices
and arrays the only places where this would cause "weird" behavior? it would
make the alignment thing more complete in the whole language.

> - Actually, in general, having padding for non-aggregate types is,
>   really weird.

hmm, well, i am not sure what is so weird about it. you also have "implcit
padding" in structs that people not expect, too. however, when compared with
the padding in structs, @align() on types is less weird imo.

> - Stuff is passed by value, so either alignment would be discarded on
>   assignment, or it would be viral, and neither make sense IMO. It's
>   similar to the issue we currently have with const (and type flags in
>   general), and also similar to type qualifiers in C, which have weird
>   semantics.

if you pass stuff by value, the alignment would just be discarded, i think
this makes sense, because it is the expected behavior, at least the one
that i would expect.

> - Adding onto the last point: if `foo` has type `*@align(8) int`, what's
>   the type of `*foo`? With my proposal, it's `int`, but if @align were
>   generalized to all types then this would become unclear.

hmm, sorry, i am not sure what you mean. if you deferefence foo,
you would get an int which is 8-byte aligned? is this the specific
case you are talking about?:

	type int_align8 = @align(8) int;

	let x: int_align8 = 2;

	let foo: *int_align8 = &x;

	let y: int_align8 = *foo;

> - We'd need to special case types with zero or undefined size. The
>   latter special case (undefined size) is necessary for pointers as
>   well, but the former (zero size) isn't.

hmm, i would not do a special case for opaque: if you have a
*@align(8) opaque, you'd just be able to assign pointers to 8-byte
aligned types.

> I also can't think of any reason you'd use @align on a type, so, it
> feels unnecessary to me.

the reason is that i think that the alignment is more of a property
of a type than it is a property of the pointer. 

> I don't understand what you mean by my proposal being "more complex than
> it needs to be". Your proposal is more complex than mine is. It moves
> the @align syntax from pointers to types, which is a net-zero change in
> "complexity", but it also necessitates implementing the semantics for
> @align on types outside of pointers.

hmm, in the case of @align for types, only all corner cases that will be
broken have to be checked, because the logic for alignment of types is
already there and has to exist. in the case of pointers and global object,
the same has to be done, however, i think it adds another logic thing to
pointer types? either way, not a very strong argument

> (As an aside, I don't think words like "complexity"/"simple" are very
> descriptive, especially here. They read like buzzwords to me, especially
> because everyone has a different idea on what "complex" means so it
> leaves lots of room for miscommunication.)

i should stop using these words, sorry. you're right.
Details
Message ID
<D441NV5WKFNR.2V2JLQNOWUSTI@d2evs.net>
In-Reply-To
<D40FCD20TA1I.26YR2XMYF58R0@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
+1, and i'd much rather stick @align on declarations and pointers

On Wed Sep 11, 2024 at 2:07 PM UTC, Lorenz (xha) wrote:
> what i was trying to say is that types already have an alignment
> and that, when adding @align just for types, users can change the
> already present, but implicit, alignment. in my opionion, alignment

the fundamental issue is that the currently-existing alignment system 
has the invariant that type.align <= type.size, and we don't want @align 
to have that invariant. we can pretend to get rid of that invariant by 
having @align implicitly increase a type's size if it's lower than the 
alignment, but that's extremely weird and not actually in line with the 
semantics that the user requested

for example, consider (in the global context)

	let a: @align(4) u8 = 0;

what the user wants here is a 1-byte u8, whose address is guaranteed to 
be a multiple of 4. in order to be able to preserve proper alignment 
semantics, if we're doing a generalized @align, a would need to allocate 
4 bytes here instead

another example:

	let b: [4]@align(4) u8 = [0...];

in order to do a generalized @align, we'd now need to 4-align &b[i] for 
all i, which requires 16 bytes instead of the 4 that the user probably 
wanted us to use. not only that, but now [4]@align(4) u8 is completely 
abi-incompatible with [4]u8, which means we probably can't have them be 
assignable

(now, in practice, you'd actually probably want @align(4) [4]u8 instead, 
but the existence of the latter is a problem that you need to deal with 
as soon as you're generalizing @align beyond declarations and pointers)

more generally (hah), generalizing @align introduces quite a few ways to 
get harec to generate bizarre code, but i don't think it actually allows 
you to do any useful new things

On Sat Sep 7, 2024 at 11:01 PM UTC, Sebastian wrote:
> 2. When taking the address of a misaligned struct/union (either because
>    of an unaligned @offset or a @packed struct), the alignment is always
>    1. This also applies to nested access expressions. For example:
>
> 	type t = struct { @offset(6) x: struct { y: [2]u32 } };
> 	let foo = t { ... };
> 	let bar = &foo.x.y[1]; // -> *@align(1) u32
>
>    Another option is to set the alignment to the largest power of two
>    multiple of the offset, so in the above example the alignment would
>    be 2. I prefer always having it be 1, but either-or is fine.

i think we should use pointer alignment to represent the minimum 
alignment that the compiler can guarantee will always be met, which 
implies that bar's alignment should be 2 here. is there a reason you'd 
prefer for it to be 1?

i can't think of any real-world contexts where you'd care about the 
difference between a 2-aligned and a 1-aligned u32, but consider:

	// as in your example
	type t = struct { @offset(6) x: struct { y: [2]u32 } };
	let foo = t { ... };
	let bar = &foo.x.y[1]; // -> *@align(2) u32

	let baz = bar: *[2]u16; // allowed, since the implicit alignment here is also 2
	baz[1]; // guaranteed to be aligned, even though bar wasn't

the cast to get baz is safe, even though under your proposal it'd 
require a double-cast in order to appease the alignment gods

i only suggest this because i don't think it'll be particularly harder 
to implement, and it seems slightly more Correct to me

> When `alloc` is provided a pointer type hint with a custom alignment,
> the allocated storage has (at least) that alignment. The maximum
> supported alignment is implementation-defined. No matter what, the
> alignment is guaranteed to be greater than or equal to the largest
> definite type alignment (currently 8, but possibly 16 in the future if
> we add 128-bit types). The alignment of the returned pointer is always
> at least this value. So the following code *is* correct, and will
> continue to compile:

this isn't gonna be trivial, i'd prefer not to implement aligned_alloc 
unless there's a particularly strong need for it

(or, put a different way: if we do this, i'd like for our 
implementation-defined limit to be 8/16)
Details
Message ID
<D44LHN3N2NWG.2RE5O2PURP41Q@sebsite.pw>
In-Reply-To
<D441NV5WKFNR.2V2JLQNOWUSTI@d2evs.net> (view parent)
DKIM signature
pass
Download raw message
On Thu Sep 12, 2024 at 1:08 AM EDT, Ember Sawady wrote:
> On Sat Sep 7, 2024 at 11:01 PM UTC, Sebastian wrote:
> > 2. When taking the address of a misaligned struct/union (either because
> >    of an unaligned @offset or a @packed struct), the alignment is always
> >    1. This also applies to nested access expressions. For example:
> >
> > 	type t = struct { @offset(6) x: struct { y: [2]u32 } };
> > 	let foo = t { ... };
> > 	let bar = &foo.x.y[1]; // -> *@align(1) u32
> >
> >    Another option is to set the alignment to the largest power of two
> >    multiple of the offset, so in the above example the alignment would
> >    be 2. I prefer always having it be 1, but either-or is fine.
>
> i think we should use pointer alignment to represent the minimum 
> alignment that the compiler can guarantee will always be met, which 
> implies that bar's alignment should be 2 here. is there a reason you'd 
> prefer for it to be 1?
>
> i can't think of any real-world contexts where you'd care about the 
> difference between a 2-aligned and a 1-aligned u32, but consider:
>
> 	// as in your example
> 	type t = struct { @offset(6) x: struct { y: [2]u32 } };
> 	let foo = t { ... };
> 	let bar = &foo.x.y[1]; // -> *@align(2) u32
>
> 	let baz = bar: *[2]u16; // allowed, since the implicit alignment here is also 2
> 	baz[1]; // guaranteed to be aligned, even though bar wasn't
>
> the cast to get baz is safe, even though under your proposal it'd 
> require a double-cast in order to appease the alignment gods
>
> i only suggest this because i don't think it'll be particularly harder 
> to implement, and it seems slightly more Correct to me

We already talked about most of this on IRC, but for completeness here's
a summary of stuff:

I think making the alignment 1 felt more "consistent" to me, and less
"arbitrary" (even though it isn't arbitrary at all). For example:

	type t = struct {
		x: u32,
		y: u64,
		z: u32,
	};
	let foo = t { ... };

foo.x and foo.z are always 8-byte aligned, but IMO it makes more sense
to treat them as 4-byte aligned. If we go with this behavior (plus
calculating the alignment of unaligned fields), then the behavior for
@offset and @packed would be to use the maximum guaranteed alignment,
unless that alignment is larger than the type's alignment, in which case
the type's alignment is used instead.

Another example is with arrays:

	type t = struct {
		x: u64,
		y: [5]u16,
	};
	let foo = t { ... };

foo.y[0] has alignment 8, foo.y[2] has alignment 4, and foo.y[4] has
alignment 8 again. foo.y[1] and foo.y[3] have alignment 2. harec can
figure this out, but it feels a bit more intuitive to me that foo.y[N]
always has alignment 2.

This also applies outside of structs, with pointers:

	let x = alloc([1u32, 2, 3])!;
	&x[0]; // -> *@align(8) u32
	&x[1]; // -> *u32
	&x[2]; // -> *@align(8) u32

This *works*, and I guess it's technically safer, but it still feels
weird to me. Especially because the binding could be written like this:

	let x: *[_]u32 = alloc([1, 2, 3]);

In which case &x[N] is always *u32, because the array's overalignment is
assigned away.

This also relies on information which isn't stored within the type,
resulting in strange behavior:

	let x = (1u64, 2u32, (3u32, 4u32));
	&x.2[1]; // -> *@align(8) u32

	let x = (1u64, 2u32, (3u32, 4u32));
	let y = &x.2;
	&y[1]; // -> *u32

	let x = (1u64, 2u32, (3u32, 4u32));
	&(&x.2)[1]; // -> probably *u32?

Another example, which doesn't use overalignment:

	type t = struct {
		@offset(6) x: struct @packed {
			x: u16,
			y: u32,
		}.
	};
	let foo = t { ... };

	&foo.x.y; // -> *u32

	let x = &foo.x;
	&x.y; // -> *@align(2) u32

	&(&foo.x).y; // -> probably *@align(2) u32?

So doing this would require that nested alignments were checked as a
single expression, pretty much. And that's ugly and a bad idea, so then
the two options are to compromise by having semantics for unaligned
fields that give them an alignment greater than 1 without deducing
alignment for other accesses, or just saying that unaligned stuff is
always given alignment 1, end of story. No matter which we choose, it
still leaves open the question of what to do with anonymous embedded
structs:

	type t = struct @packed {
		@offset(2) struct @packed {
			@offset(2) x: u32,
		},
	};
	let foo = t { ... };
	&foo.x; // *u32?

The actual offset of field x relative to t is known, but if we don't
compute alignment across nested accesses, then using this when deducing
alignment would mean that x's type changes depending on whether the
struct is named or not. But saying foo.x has alignment 2 is also
strange, since one would expect that the expression would give the same
result if foo's type were `struct @packed { offset(4) x: u32 }`, but it
wouldn't.

The conclusion we came to is that we should treat anonymous embedded
structs as a sort of implicit nested access, so &foo.x's type would be
`*@align(2) u32`, even though harec could easily deduce that the field
is aligned. It's the least ugly solution. I was also convinced that
taking the maximum alignment of a field not greater than the alignment
of its type (rather than always using alignment 1) is probably the best
option.

Finally, we talked about the special cases needed for nested
indexing/access expressions when a field is misaligned. If the operand
isn't a pointer, this information isn't preserved in the type, so check
will need to handle it some other way. With the object model proposed
alongside mutability overhaul, this could be achieved by storing the
alignment alongside the result of each expression, but with our current
semantics this would be incorrect, since we sometimes makes a copy but
sometimes doesn't (for example: &*x makes a copy, but &(*x)[0] doesn't,
since the latter is an object selector but the former isn't).

So I think the best way to handle this is to check the operand of an
indexing/access expression as though its address were being taken,
unless it's a unary * expression, in which case the * is discarded
(recursively). The special case with unary * will cease to be necessary
with the improved object model, but for now it's necessary.

> > When `alloc` is provided a pointer type hint with a custom alignment,
> > the allocated storage has (at least) that alignment. The maximum
> > supported alignment is implementation-defined. No matter what, the
> > alignment is guaranteed to be greater than or equal to the largest
> > definite type alignment (currently 8, but possibly 16 in the future if
> > we add 128-bit types). The alignment of the returned pointer is always
> > at least this value. So the following code *is* correct, and will
> > continue to compile:
>
> this isn't gonna be trivial, i'd prefer not to implement aligned_alloc 
> unless there's a particularly strong need for it
>
> (or, put a different way: if we do this, i'd like for our 
> implementation-defined limit to be 8/16)

Hm, I assumed it would be relatively simple to implement, but if it
isn't then I guess that's fine.
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZuOrPw5c2K-0Hddp@xha.li>
In-Reply-To
<D441NV5WKFNR.2V2JLQNOWUSTI@d2evs.net> (view parent)
DKIM signature
pass
Download raw message
On Thu, Sep 12, 2024 at 05:08:34AM +0000, Ember Sawady wrote:
> +1, and i'd much rather stick @align on declarations and pointers
> 
> On Wed Sep 11, 2024 at 2:07 PM UTC, Lorenz (xha) wrote:
> > what i was trying to say is that types already have an alignment
> > and that, when adding @align just for types, users can change the
> > already present, but implicit, alignment. in my opionion, alignment
> 
> the fundamental issue is that the currently-existing alignment system 
> has the invariant that type.align <= type.size, and we don't want @align 
> to have that invariant. we can pretend to get rid of that invariant by 
> having @align implicitly increase a type's size if it's lower than the 
> alignment, but that's extremely weird and not actually in line with the 
> semantics that the user requested
> 
> for example, consider (in the global context)
> 
> 	let a: @align(4) u8 = 0;
> 
> what the user wants here is a 1-byte u8, whose address is guaranteed to 
> be a multiple of 4. in order to be able to preserve proper alignment 
> semantics, if we're doing a generalized @align, a would need to allocate 
> 4 bytes here instead

exactly this is my selling point of having align for types lol

and also, i'd expect that an @align(4) u8 is allocating 4 bytes,
but i might be asking for too much knowledge from users how stuff
in the background works

> another example:
> 
> 	let b: [4]@align(4) u8 = [0...];
> 
> in order to do a generalized @align, we'd now need to 4-align &b[i] for 
> all i, which requires 16 bytes instead of the 4 that the user probably 
> wanted us to use. not only that, but now [4]@align(4) u8 is completely 
> abi-incompatible with [4]u8, which means we probably can't have them be 
> assignable

assignable by could value work, but by reference doesn't. that'd be
kinda intended though.

> (now, in practice, you'd actually probably want @align(4) [4]u8 instead, 
> but the existence of the latter is a problem that you need to deal with
> as soon as you're generalizing @align beyond declarations and pointers)
> 
> more generally (hah), generalizing @align introduces quite a few ways to 
> get harec to generate bizarre code, but i don't think it actually allows 
> you to do any useful new things

i mean, there are some cases where processors and stuff expect
specifically aligned memory (SIMD, for example, i belive), and for
that it is quite nice to not have to fall back to assembly.
Details
Message ID
<D44WLSI5AIO7.2X94QVO32173Z@d2evs.net>
In-Reply-To
<ZuOrPw5c2K-0Hddp@xha.li> (view parent)
DKIM signature
pass
Download raw message
On Fri Sep 13, 2024 at 3:02 AM UTC, Lorenz (xha) wrote:
> and also, i'd expect that an @align(4) u8 is allocating 4 bytes,
> but i might be asking for too much knowledge from users how stuff
> in the background works

doing the @align in the binding allows us to avoid allocating more than 
1 byte - the only thing we actually care about is that the global's 
address % 4 == 0

> > another example:
> > 
> > 	let b: [4]@align(4) u8 = [0...];
> > 
> > in order to do a generalized @align, we'd now need to 4-align &b[i] for 
> > all i, which requires 16 bytes instead of the 4 that the user probably 
> > wanted us to use. not only that, but now [4]@align(4) u8 is completely 
> > abi-incompatible with [4]u8, which means we probably can't have them be 
> > assignable
>
> assignable by could value work, but by reference doesn't. that'd be
> kinda intended though.

the code to implicitly convert between those is gonna be awful, and 
it'll require an implicit allocation + loop for slices. no thanks

and, bear in mind: the thing people almost certainly want is for the 
start of the array to be aligned, not each individual element

> i mean, there are some cases where processors and stuff expect
> specifically aligned memory (SIMD, for example, i belive), and for
> that it is quite nice to not have to fall back to assembly.

for example, here. simd cares about the alignment of the start of the 
vector, but it *definitely* isn't adding internal padding in order to 
over-align individual elements, because nobody wants to do that

a slightly different framing i got to on irc earlier today: alignment is 
a property of objects, similar to mutability, and the reason for 
restricting it to operating on objects is similar to the reasoning for 
restricting mutability to bindings and pointers in the mutability 
overhaul
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZuRXnY1mJuAyQY23@xha.li>
In-Reply-To
<D44WLSI5AIO7.2X94QVO32173Z@d2evs.net> (view parent)
DKIM signature
pass
Download raw message
ok, well, convinced. +1 to @align in pointers, globals and static
locals, thanks for discussing this with me :-)

don't think it is worth having for non-static locals.
Details
Message ID
<D45SQVRX8YQ0.U7ZGOUXVFQ42@sebsite.pw>
In-Reply-To
<D40FCD20TA1I.26YR2XMYF58R0@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
What are people's thoughts on allowing pointer alignment to be increased
via type assertion?

	type s = struct {
		@offset(2) struct {
			@offset(2) x: u32,
		},
	};
	let foo = s { ... };
	&foo.x as *@align(4) u32; // runtime check succeeds

	let bar = (0u32, 1u8, 2u8);
	&bar.2 as *@align(4) u8; // runtime check fails; abort

I like the idea of enabling runtime safety here, but this would imply
also allowing matching on pointer alignment, and I'm unsure exactly what
the semantics of that would be.

Prior art: https://ziglang.org/documentation/master/#Incorrect-Pointer-Alignment
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZuUwT9vMZ_MF-TLr@xha.li>
In-Reply-To
<D45SQVRX8YQ0.U7ZGOUXVFQ42@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
On Sat, Sep 14, 2024 at 02:34:41AM -0400, Sebastian wrote:
> What are people's thoughts on allowing pointer alignment to be increased
> via type assertion?

i am not sure. that would mean that there is another use for "as"
(in addition to the ones that will probably come in the future) and
i don't know why you'd want to use a runtime-assertion that something
has a specific alignment? even in that case, you could use:

	assert(&foo.x: uintptr % 4 == 0);

are there more use-cases that justify adding this to "as"? also
with regards to weirdness, this is... somewhat weird and the
hand-written assertion is definitely easier to understand and read.

> 	type s = struct {
> 		@offset(2) struct {
> 			@offset(2) x: u32,
> 		},
> 	};
> 	let foo = s { ... };
> 	&foo.x as *@align(4) u32; // runtime check succeeds
> 
> 	let bar = (0u32, 1u8, 2u8);
> 	&bar.2 as *@align(4) u8; // runtime check fails; abort
> 
> I like the idea of enabling runtime safety here, but this would imply
> also allowing matching on pointer alignment, and I'm unsure exactly what
> the semantics of that would be.
> 
> Prior art: https://ziglang.org/documentation/master/#Incorrect-Pointer-Alignment
Details
Message ID
<D4C29PN8A544.2NDLFVRCP0QBS@d2evs.net>
In-Reply-To
<D45SQVRX8YQ0.U7ZGOUXVFQ42@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
-1 for now, might be open to reconsidering this in the context of a 
future reorganization of casts
Details
Message ID
<D6BEWMISZXJZ.3ML08ZFRDEVIW@turminal.net>
In-Reply-To
<D40FCD20TA1I.26YR2XMYF58R0@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
Hi,

First, apologies for raising these points 3 months into the proposal, this was
prompted by a discussion of @packed on irc yesterday and I missed this
whole proposal before.

Second, my thoughts are mostly about the aligned pointer side of this.
I'm not sure how I feel about @align on globals/static locals. If people want
it, I'm not strongly opposed, but I'd be interested in knowing some real world
examples where one would use this.

As for the aligned pointers part, I can see the theoretical appeal, but I
really don't think we should do this. The example you show does not convince
me:

> To see why @align on pointers is desirable, take the following code:
> 	// use crypto::random;
> 	let buf: [size(u64)]u8 = [0...];
> 	random::buffer(buf);
> 	const n = *(&buf: *u64);
>
> At a glance, this code looks perfectly fine. Code similar to this is
> used multiple times in the standard library. But there's a subtle nasty
> bug: buf may not meet the alignment requirements for u64. So when
> casting to *u64, the pointer may be misaligned! This doesn't cause
> issues on our currently supported targets (besides poor performance),
> but on some targets this could cause a crash, or potentially worse. We
> currently provide no safeguards against bugs caused by pointer
> misalignment.

Yeah, this is bad, and we should disallow it by checking alignment
requirements are met in pointer casts. Doing so does not require adding
anything to the language but a single sentence in the spec. So I cannot agree
with this claim:

> Introducing @align on pointer types would fix this issue.

The issue, at least as presented here, is easily fixable without it. More
importantly, this:

> 	let n = 0u64;
> 	random::buffer(&n: *[size(u64)]u8);

and this:

> 	let buf = alloc([0...]: [size(u64)]u8)!; // -> *@align(8) [size(u64)]u8
> 	random::buffer(buf);
> 	const n = *(buf: *u64);


are not correct ways to solve the problem above. The correct way to
solve the problem is to pass in an [8]u8 array and then use the
appropriate function from endian:: to get out an integer.


> - `*u64` is assignable to `*@align(4) u64`, but not to `*@align(16) u64`

Imo this is the biggest problem with current proposal. It assumes
concrete numbers make sense as alignment values, but unfortunately they
do not, especially in contexts where they can cause underalignment. The
same number can mean an overalignment in one implementation, permissible
underalignment in another implementation and a cause for processor fault
in a third one. All of this introduces nonportable behavior that we
should reduce to the minimum, not facilitate. Here's my mental model of
the desired behavior:

First I should note that I use "unaligned access" in the sense of not
matching Hare's implementation-defined requirement, and that when I say
alignment is implementation defined in Hare, I mean post #950 state.

Hare spec makes a guarantee that as long as all pointer accesses are
aligned, everything will work as expected within Hare. Of course we
can't just arbitrarily make this number up, because we want to interact
with the environment - call C functions, or be called from C functions,
or interact with WASM builtins or whatever - so this number has to match
whatever the environment wants it to be. But the important part here is
that Hare guarantees that as long as this (implementation-defined) number
is respected, the code is portable across Hare implementations.

So far so good. The problem arises when the implementation declares its
minimal alignment requirements based on assumptions that hold at some
level of abstraction, and we want to interact with lower levels from
within Hare. For example, SysV-amd64 declares alignment of pointers to
be 8 but this isn't actually required on amd64 and linux for example
declares epoll_event in such a way that makes some pointer in it
unaligned.

So now we need to account for such situations, and the solution is
@packed. It can be called @unaligned or whatever if we want to. @packed
structs disobey the implementation defined alignment requirements
of its members, may have additional weird requirements, and are
generally non-portable. Their alignment is not necessarily 1, because it
is implementation defined, and the actual number is irrelevant for their
interaction with the rest of the type system, because they are rendered
incompatible with all other types merely by being @packed anyway. The
address of their members cannot be taken.

They are basically an escape hatch, for when portable Hare code cannot
achieve what you want because it doesn't model reality closely enough.

My question here now is, what does this description miss? What can be
done with unaligned pointers that cannot be done with aligned ones? Do
we need unaligned pointers? I don't think we do, and I think the reason
"the required escape hatch is good enough without them" should suffice
for us to not implement it. I can also provide more specific arguments
if anyone is interested, but this is a long email even without those.

If we come to the conclusion that we absolutely cannot do without
pointers to unaligned values, they should be implemented without any
kind of arithmetic on alignment requirements. A pointer is either
aligned, or @unaligned, there is no compatibility between the two
beyond probably casting one to uintptr and uintptr to the other, and
@unaligned does not mean alignment = 1. Again I would prefer to not
extend this mail further with explaining why I believe giving unaligned
pointers concrete alignment numbers is not a good idea, but we can
discuss this further in some follow up if anyone wishes to.

So this was mostly an argument against underalignment. For overalignment,
one thing to keep in mind is that one implementation's overalignment is
another's underalignment. We could alleviate that by refusing to compile
code that specifies lower alignment than implementation's default, or we
could just define @align(x) to mean max(default, x), since technically,
being aligned to k*x implies being aligned to x anyway.

Other than that, I just think the use case for overaligned pointers is
too niche to justify separate syntax and the added complexity over not
having them. Like, we should totally provide a function that allocates
aligned memory, but beyond that, it's the programmer's job to take
advantage of alignment and to ensure they do so safely. But I'm not
nearly as strongly opposed to having them as I am to underalignment.


> I'm not bothering with @align on slice types in this RFC, since the
> semantics are unintuitive: `[]@align(8) u32` would mean that the backing
> array has the alignment 8 (i.e. its type is `*@align(8) [*]u32`), *not*
> that the elements all do.

But this is the same as with *@align(8)[8]i32, right? Just the first element is
aligned. Isn't this potentially confusing to the same degree as the slice
example?


>                               EXAMPLE CODE

I'd be interested in seeing some code that uses the new @align keyword in a
truly essential way.

> Prior art:
> - Zig: https://ziglang.org/documentation/master/#Alignment

Do they allow decreasing the alignment from the default alignment of a type?
The link doesn't really say afaict.
Details
Message ID
<D6EA7LL9MWYH.DGHIIJQSK6KU@sebsite.pw>
In-Reply-To
<D6BEWMISZXJZ.3ML08ZFRDEVIW@turminal.net> (view parent)
DKIM signature
pass
Download raw message
On Sat Dec 14, 2024 at 7:11 AM EST, Bor Grošelj Simić wrote:
> Hi,
>
> First, apologies for raising these points 3 months into the proposal, this was
> prompted by a discussion of @packed on irc yesterday and I missed this
> whole proposal before.

No worries :)

> Second, my thoughts are mostly about the aligned pointer side of this.
> I'm not sure how I feel about @align on globals/static locals. If people want
> it, I'm not strongly opposed, but I'd be interested in knowing some real world
> examples where one would use this.

To be completely honest, I don't know of any off the top of my head. I
figured it might be useful in some freestanding programs, and I figured
that since C has it there must be some good use case. I'm not sure what
that use case is though, so we may be able to skip it.

I think it might be possible to enforce alignment with the linker? If
so, the benefit of @align on globals in the language itself would be
that the alignment of the resulting pointer can be set accordingly when
taking its address. Of course, that depends on @align on pointers being
a thing.

> The issue, at least as presented here, is easily fixable without it. More
> importantly, this:
>
> > 	let n = 0u64;
> > 	random::buffer(&n: *[size(u64)]u8);
>
> and this:
>
> > 	let buf = alloc([0...]: [size(u64)]u8)!; // -> *@align(8) [size(u64)]u8
> > 	random::buffer(buf);
> > 	const n = *(buf: *u64);
>
>
> are not correct ways to solve the problem above. The correct way to
> solve the problem is to pass in an [8]u8 array and then use the
> appropriate function from endian:: to get out an integer.

There's some cases when you're working with low-level code where you
might want to cast to byte-addressable storage, and then later cast back
to something with larger alignment requirements. I agree that endian::
might be the best option for this particular example, but it isn't free.
A pointer cast has no runtime cost.

> > - `*u64` is assignable to `*@align(4) u64`, but not to `*@align(16) u64`
>
> Imo this is the biggest problem with current proposal. It assumes
> concrete numbers make sense as alignment values, but unfortunately they
> do not, especially in contexts where they can cause underalignment. The
> same number can mean an overalignment in one implementation, permissible
> underalignment in another implementation and a cause for processor fault
> in a third one. All of this introduces nonportable behavior that we
> should reduce to the minimum, not facilitate. Here's my mental model of
> the desired behavior:

This is a good point: back when I wrote this proposal, I didn't realize
that some platforms used 4-byte alignment for u64, for instance. So
maybe it would be better for @align's operand to be a type rather than a
number? When updating the stdlib for this, I found myself doing
@align(align(T)) a *lot*, so maybe that should just be the default
behavior? That would be less annoying/verbose, and also help alleviate
some of the portability issues you mentioned.

> First I should note that I use "unaligned access" in the sense of not
> matching Hare's implementation-defined requirement, and that when I say
> alignment is implementation defined in Hare, I mean post #950 state.
>
> Hare spec makes a guarantee that as long as all pointer accesses are
> aligned, everything will work as expected within Hare. Of course we
> can't just arbitrarily make this number up, because we want to interact
> with the environment - call C functions, or be called from C functions,
> or interact with WASM builtins or whatever - so this number has to match
> whatever the environment wants it to be. But the important part here is
> that Hare guarantees that as long as this (implementation-defined) number
> is respected, the code is portable across Hare implementations.
>
> So far so good. The problem arises when the implementation declares its
> minimal alignment requirements based on assumptions that hold at some
> level of abstraction, and we want to interact with lower levels from
> within Hare. For example, SysV-amd64 declares alignment of pointers to
> be 8 but this isn't actually required on amd64 and linux for example
> declares epoll_event in such a way that makes some pointer in it
> unaligned.
>
> So now we need to account for such situations, and the solution is
> @packed. It can be called @unaligned or whatever if we want to. @packed
> structs disobey the implementation defined alignment requirements
> of its members, may have additional weird requirements, and are
> generally non-portable. Their alignment is not necessarily 1, because it
> is implementation defined, and the actual number is irrelevant for their
> interaction with the rest of the type system, because they are rendered
> incompatible with all other types merely by being @packed anyway. The
> address of their members cannot be taken.

Just to note, we do also support unaligned @offsets in unpacked structs.
This keeps the struct's alignment the same, and possibly adds padding to
the end. Though I could see an argument that maybe we shouldn't allow
that?

I'm not sure what you mean by "their alignment is not necessarily 1".
Are you suggesting that the type's alignment is implementation-defined?
Or that the type's alignment within Hare is 1, but the *actual*
in-context alignment is some other unknown value?

I also don't think we should disallow taking the address of unaligned
members. Misalignment is already something you have to opt-in to; if you
want a pointer to such a member for whatever purpose then that should be
possible. As of right now it's dangerous, especially because there's no
way of knowing whether &foo.bar takes a misaligned address without
knowing the internals of foo's type. But this RFC helps a lot with that
issue, since the resulting pointer would no longer be usable in places
where an aligned pointer is expected.

> If we come to the conclusion that we absolutely cannot do without
> pointers to unaligned values, they should be implemented without any
> kind of arithmetic on alignment requirements. A pointer is either
> aligned, or @unaligned, there is no compatibility between the two
> beyond probably casting one to uintptr and uintptr to the other, and
> @unaligned does not mean alignment = 1. Again I would prefer to not
> extend this mail further with explaining why I believe giving unaligned
> pointers concrete alignment numbers is not a good idea, but we can
> discuss this further in some follow up if anyone wishes to.

I'd like to hear your reasoning for being against concrete values, yeah.
Either way, @unaligned would be better than what we have now, though I
think it should be possible to cast between *@unaligned T and *T, since
requiring a uintptr cast makes the cast more unsafe, since there's no
compiler check that the secondary type remains the same.

> Other than that, I just think the use case for overaligned pointers is
> too niche to justify separate syntax and the added complexity over not
> having them. Like, we should totally provide a function that allocates
> aligned memory, but beyond that, it's the programmer's job to take
> advantage of alignment and to ensure they do so safely. But I'm not
> nearly as strongly opposed to having them as I am to underalignment.

I agree that we should be weary about adding new features whose use
cases are too niche to justify, and so "this feature is simple to
implement" isn't a good enough reason to warrant its inclusion, but
FWIW, if we decide that the feature is useful, the implementation for
@align on pointers is pretty straightforward.

> > I'm not bothering with @align on slice types in this RFC, since the
> > semantics are unintuitive: `[]@align(8) u32` would mean that the backing
> > array has the alignment 8 (i.e. its type is `*@align(8) [*]u32`), *not*
> > that the elements all do.
>
> But this is the same as with *@align(8)[8]i32, right? Just the first element is
> aligned. Isn't this potentially confusing to the same degree as the slice
> example?

Not really, at least not to me. *@align(N) T indicates that the T which
is pointed to is at least N-byte aligned. []@align(N) T would appear to
indicate something similar; that each T in the slice is N-byte aligned,
but in reality it's the array which is aligned, not the members. With
*@align(8) [8]i32, it's clear that the object which is aligned 8 has
type [8]i32; the i32s themselves don't necessarily have the same
alignment.

It isn't a huge deal either way, since once you know what it means it's
pretty intuitive (Zig allows `align` with slices with this syntax), but
I'm not convinced it has many benefits over restricting it to pointers.

> I'd be interested in seeing some code that uses the new @align keyword in a
> truly essential way.

Looking through where I've used it, I don't think it's ever ""truly""
essential, but it's still very useful, and IMO a valid way to express
the logic in many cases (particularly in low-level code like malloc).

> > Prior art:
> > - Zig: https://ziglang.org/documentation/master/#Alignment
>
> Do they allow decreasing the alignment from the default alignment of a type?
> The link doesn't really say afaict.

Yeah, it has a very similar syntax to this proposal, except with an
`align` keyword rather than @align. See where it shows examples with
align(1).
Details
Message ID
<D6EACPNS4XTF.2YFDLWB9U2EML@cmpwn.com>
In-Reply-To
<D6EA7LL9MWYH.DGHIIJQSK6KU@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
On Tue Dec 17, 2024 at 10:09 PM CET, Sebastian wrote:
> > Second, my thoughts are mostly about the aligned pointer side of this.
> > I'm not sure how I feel about @align on globals/static locals. If people want
> > it, I'm not strongly opposed, but I'd be interested in knowing some real world
> > examples where one would use this.
>
> To be completely honest, I don't know of any off the top of my head. I
> figured it might be useful in some freestanding programs, and I figured
> that since C has it there must be some good use case. I'm not sure what
> that use case is though, so we may be able to skip it.

There are several real-world use-cases for @align on globals, indeed
many of them in freestanding. A lot of data structures have to have
specific unconventional (usually large) alignments in low-level systems
programming. For example, page tables on x86_64 have to be aligned on
4096. This also comes up in userspace, for example, if you want to make
sure your data structures are cache-aligned for optimization purposes.

> I think it might be possible to enforce alignment with the linker?

Yes, you can instruct the linker to arrange for a given alignment for a
symbol. This might involve patching qbe. Just emit the following
assembly code:

.align 4096
symbol:
	// ...

> There's some cases when you're working with low-level code where you
> might want to cast to byte-addressable storage, and then later cast back
> to something with larger alignment requirements. I agree that endian::
> might be the best option for this particular example, but it isn't free.
> A pointer cast has no runtime cost.

fwiw, drive-by comment I admit, but endian:: may someday be free as it's
an obvious candidate for inline functions whenever those come around.

> This is a good point: back when I wrote this proposal, I didn't realize
> that some platforms used 4-byte alignment for u64, for instance. So
> maybe it would be better for @align's operand to be a type rather than a
> number? When updating the stdlib for this, I found myself doing
> @align(align(T)) a *lot*, so maybe that should just be the default
> behavior? That would be less annoying/verbose, and also help alleviate
> some of the portability issues you mentioned.

Grammatically speaking, @align(type) and @align(numeric literal) are
unambiguously distinguishable. FYI.

> Just to note, we do also support unaligned @offsets in unpacked structs.
> This keeps the struct's alignment the same, and possibly adds padding to
> the end. Though I could see an argument that maybe we shouldn't allow
> that?

Another drive-by note, I mentioned this on IRC but to repeat it here:
I'm interested in removing @offset and keeping @packed, as well as
making it easier to define unnammed placeholder/padding fields in
structs by allowing any number of fields named _ to be silently
discarded.

Note that freestanding use-cases often require unaligned memory
accesses. We should not forbid them entirely.

Unaligned memory access is not actually fatal on most modern platforms,
including x86-64 and aarch64 (though both have optional features to
raise exceptions on unaligned accesses, disabled by most modern
operating systems in practice).

The main rationale for aligning everything is not only that some
platforms forbid unaligned memory access, but also that even on the
platforms that allow it, it often comes at a steep performance cost.

I'm not sure we should even make it opt-in with something like an
@unaligned keyword. I think we should not forbid it at all and just let
the CPU raise an exception if it doesn't work, same as a segfault or
divide by zero.
Details
Message ID
<D6EAIERF4MSP.IH12KQX6DS3Q@sebsite.pw>
In-Reply-To
<D6EACPNS4XTF.2YFDLWB9U2EML@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
On Tue Dec 17, 2024 at 4:15 PM EST, Drew DeVault wrote:
> Another drive-by note, I mentioned this on IRC but to repeat it here:
> I'm interested in removing @offset and keeping @packed, as well as
> making it easier to define unnammed placeholder/padding fields in
> structs by allowing any number of fields named _ to be silently
> discarded.

Last I remember you were hard NACK to removing @offset, but FWIW I'm
strong +1 to removing @offset and allowing unnamed fields with _.
Details
Message ID
<D6EAJMD5J5M5.CF7SHG2SBGH6@cmpwn.com>
In-Reply-To
<D6EAIERF4MSP.IH12KQX6DS3Q@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
On Tue Dec 17, 2024 at 10:23 PM CET, Sebastian wrote:
> Last I remember you were hard NACK to removing @offset, but FWIW I'm
> strong +1 to removing @offset and allowing unnamed fields with _.

I've since come around, yeah, because the combination of @packed and
unnamed fields (and, if necessary, unions) allows you to construct
anything that you could construct with @offset without being as much of
a pain in the ass as @offset is.

I am and have always been strong NACK to removing @packed, which Ember
has always objected to.
Details
Message ID
<D6EETO8XQSZO.3L65YLR08JBHQ@turminal.net>
In-Reply-To
<D6EA7LL9MWYH.DGHIIJQSK6KU@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
> > The issue, at least as presented here, is easily fixable without it. More
> > importantly, this:
> >
> > > 	let n = 0u64;
> > > 	random::buffer(&n: *[size(u64)]u8);
> >
> > and this:
> >
> > > 	let buf = alloc([0...]: [size(u64)]u8)!; // -> *@align(8) [size(u64)]u8
> > > 	random::buffer(buf);
> > > 	const n = *(buf: *u64);
> >
> >
> > are not correct ways to solve the problem above. The correct way to
> > solve the problem is to pass in an [8]u8 array and then use the
> > appropriate function from endian:: to get out an integer.
>
> There's some cases when you're working with low-level code where you
> might want to cast to byte-addressable storage, and then later cast back
> to something with larger alignment requirements. I agree that endian::
> might be the best option for this particular example, but it isn't free.
> A pointer cast has no runtime cost.

Yeah endian:: isn't free. But if you're sure that matters to you, then
you're also capable of sidestepping the issue correctly on your own. One
possible way is also by using a union. Those kind of sucked last time I
tried them out, and they probably still do, but we should fix that. Imo
this is ultimately a better way than using pointer casts, or at least
will be better once unions are properly supported.

>
> > > - `*u64` is assignable to `*@align(4) u64`, but not to `*@align(16) u64`
> >
> > Imo this is the biggest problem with current proposal. It assumes
> > concrete numbers make sense as alignment values, but unfortunately they
> > do not, especially in contexts where they can cause underalignment. The
> > same number can mean an overalignment in one implementation, permissible
> > underalignment in another implementation and a cause for processor fault
> > in a third one. All of this introduces nonportable behavior that we
> > should reduce to the minimum, not facilitate. Here's my mental model of
> > the desired behavior:
>
> This is a good point: back when I wrote this proposal, I didn't realize
> that some platforms used 4-byte alignment for u64, for instance. So
> maybe it would be better for @align's operand to be a type rather than a
> number? When updating the stdlib for this, I found myself doing
> @align(align(T)) a *lot*, so maybe that should just be the default
> behavior? That would be less annoying/verbose, and also help alleviate
> some of the portability issues you mentioned.

No, I don't think so, by doing this, you limit the maximal expressible
alignment to the biggest default alignment of any type and there's no
way to express cache line alignment or page size alignment, which I
suspect will be one of the most commonly used numbers. Getting a type's
alignment can also be achieved directly by making a union that has that
type as one of the members.

> > So now we need to account for such situations, and the solution is
> > @packed. It can be called @unaligned or whatever if we want to. @packed
> > structs disobey the implementation defined alignment requirements
> > of its members, may have additional weird requirements, and are
> > generally non-portable. Their alignment is not necessarily 1, because it
> > is implementation defined, and the actual number is irrelevant for their
> > interaction with the rest of the type system, because they are rendered
> > incompatible with all other types merely by being @packed anyway. The
> > address of their members cannot be taken.
>
> Just to note, we do also support unaligned @offsets in unpacked structs.
> This keeps the struct's alignment the same, and possibly adds padding to
> the end. Though I could see an argument that maybe we shouldn't allow
> that?

I was operating under the assumption @offset is going away. I don't
think it is needed, similar to how I think pointers to @packed members
are not needed - @packed is enough to express all that @offset can be
used for.

> I'm not sure what you mean by "their alignment is not necessarily 1".
> Are you suggesting that the type's alignment is implementation-defined?
> Or that the type's alignment within Hare is 1, but the *actual*
> in-context alignment is some other unknown value?

I'm not sure I understand your exact question. But I'm also not sure
what I was trying to say there :P. Let me try again:

In my vision of how these things should work in Hare, the alignment of
@packed structs is implementation defined, and may depend on the struct.
This number should be the actual alignment of objects of this type in
memory, and should be the type's alignment in Hare, normally inspectable
by using align(). So for example Hare on amd64-sysv-linux defines it to
be 1, but I think I was trying to point out in that paragraph, that
thinking of @packed as always having alignment 1 is a misconception.
This is because for example an implementation may define the alignment
of u64 to be 8, and the underlying hardware platform may support 4-byte
aligned access but not 1- or 2- aligned. So a @packed struct would
probably want to be 4-aligned on such a platform. I believe 32-bit JVM
does or used to do something like this? I'm not really sure.

> I also don't think we should disallow taking the address of unaligned
> members. Misalignment is already something you have to opt-in to; if you
> want a pointer to such a member for whatever purpose then that should be
> possible. As of right now it's dangerous, especially because there's no
> way of knowing whether &foo.bar takes a misaligned address without
> knowing the internals of foo's type. But this RFC helps a lot with that
> issue, since the resulting pointer would no longer be usable in places
> where an aligned pointer is expected.

Yes, they are no longer usable everywhere, but the problem is that they
are actually no longer usable anywhere. You can't pass them to a
standard library function, and you can't pass them to any other third
party library that wasn't made with this use case in mind, because they
don't meet alignment requirements. And in code that you wrote yourself
you can almost always easily pass around the pointer to the whole
@packed struct. In addition to mostly not being useful, my main argument
against this is that I do not think it is strictly needed, and if we
disallow this, we avoid basically the only real reason to have
@unaligned pointers at all.

> > If we come to the conclusion that we absolutely cannot do without
> > pointers to unaligned values, they should be implemented without any
> > kind of arithmetic on alignment requirements. A pointer is either
> > aligned, or @unaligned, there is no compatibility between the two
> > beyond probably casting one to uintptr and uintptr to the other, and
> > @unaligned does not mean alignment = 1. Again I would prefer to not
> > extend this mail further with explaining why I believe giving unaligned
> > pointers concrete alignment numbers is not a good idea, but we can
> > discuss this further in some follow up if anyone wishes to.
>
> I'd like to hear your reasoning for being against concrete values, yeah.
> Either way, @unaligned would be better than what we have now, though I
> think it should be possible to cast between *@unaligned T and *T, since
> requiring a uintptr cast makes the cast more unsafe, since there's no
> compiler check that the secondary type remains the same.

Casting *T -> @unaligned *T should maybe work yeah. Though in my head,
an implementation declaring the alignment of T to be 4 and alignment of
@unaligned T be 8 is plain stupid but permitted by the spec. There isn't
really a reason for this to be so, we can make the spec require
@unaligned to not increase alignment. And then we can have one way
casts. Not the other way though, there is no compatibility that way,
like trying to go from *i8 to *str, except that here we actually do have
a way to prohibit this. I totally hear your concern about ensuring
secondary type is the same, but in my opinion @unaligned *T -> *T is
equally incorrect as *T1 -> *T2 for some random T1 and T2 semantically
speaking, and the matching secondary type is not really relevant.

The reason I don't think concrete numbers are a good idea is that they
actally limit the implementation in doing what it considers best, and
the split between regular pointers and unaligned ones is less clear cut.
For example, casting *@align(4) u64: *64 is valid in some
implementations and invalid in others, because that depends on u64's
default alignment, while *@unaligned u64 is *always* incompatible with
*u64. Additionally, whether going the other way (*u64: *@align(4) u64)
is possible also depends on u64's default alignment and that also
introduces a bunch of problems.

I think the core of what I'm trying to communicate here is that @packed
structs and potentially @unaligned pointers are primarily an abi thing,
and for the sake of simplicity, it's much better to keep abi
compatibility a yes/no question, not "incompatible by how much", and
"with which abi" questions.

>
> > Other than that, I just think the use case for overaligned pointers is
> > I'd be interested in seeing some code that uses the new @align keyword in a
> > truly essential way.
>
> Looking through where I've used it, I don't think it's ever ""truly""
> essential, but it's still very useful, and IMO a valid way to express
> the logic in many cases (particularly in low-level code like malloc).

Actually this is kind of growing on me I have to admit.

I still don't think overaligning to another type's alignment is a good
use case, those places have better options to choose from, but yeah, now
that I thought about this a bit more I can see how I'd use this in
something like malloc.

>
> > > Prior art:
> > > - Zig: https://ziglang.org/documentation/master/#Alignment
> >
> > Do they allow decreasing the alignment from the default alignment of a type?
> > The link doesn't really say afaict.
>
> Yeah, it has a very similar syntax to this proposal, except with an
> `align` keyword rather than @align. See where it shows examples with
> align(1).

I see. I missed it because they only ever use align(1) on functions.
Details
Message ID
<D6EOY0MYXUPP.YMO6ANDM7I1D@turminal.net>
In-Reply-To
<D6EACPNS4XTF.2YFDLWB9U2EML@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
> > This is a good point: back when I wrote this proposal, I didn't realize
> > that some platforms used 4-byte alignment for u64, for instance. So
> > maybe it would be better for @align's operand to be a type rather than a
> > number? When updating the stdlib for this, I found myself doing
> > @align(align(T)) a *lot*, so maybe that should just be the default
> > behavior? That would be less annoying/verbose, and also help alleviate
> > some of the portability issues you mentioned.
>
> Grammatically speaking, @align(type) and @align(numeric literal) are
> unambiguously distinguishable. FYI.

We'd want more than just literals though, arithmetic should be permitted, and
evaluation of defines. In any case, I think @align(type) is not important
enough use case for us to deal with it.

>
> > Just to note, we do also support unaligned @offsets in unpacked structs.
> > This keeps the struct's alignment the same, and possibly adds padding to
> > the end. Though I could see an argument that maybe we shouldn't allow
> > that?
>
> Another drive-by note, I mentioned this on IRC but to repeat it here:
> I'm interested in removing @offset and keeping @packed, as well as
> making it easier to define unnammed placeholder/padding fields in
> structs by allowing any number of fields named _ to be silently
> discarded.
>
> Note that freestanding use-cases often require unaligned memory
> accesses. We should not forbid them entirely.

Yeah, absolutely.


> The main rationale for aligning everything is not only that some
> platforms forbid unaligned memory access, but also that even on the
> platforms that allow it, it often comes at a steep performance cost.
>
> I'm not sure we should even make it opt-in with something like an
> @unaligned keyword. I think we should not forbid it at all and just let
> the CPU raise an exception if it doesn't work, same as a segfault or
> divide by zero.

We don't really have a choice here. We don't just have to obey what the
hardware says is allowed, but also what the abi mandates. The abi of
all of our current implementations says that things should be aligned
such and such, and we have to honor that.
Details
Message ID
<D6ET5C8QF9QL.NMU4S92ZKCNE@turminal.net>
In-Reply-To
<D6EETO8XQSZO.3L65YLR08JBHQ@turminal.net> (view parent)
DKIM signature
pass
Download raw message
> > > If we come to the conclusion that we absolutely cannot do without
> > > pointers to unaligned values, they should be implemented without any
> > > kind of arithmetic on alignment requirements. A pointer is either
> > > aligned, or @unaligned, there is no compatibility between the two
> > > beyond probably casting one to uintptr and uintptr to the other, and
> > > @unaligned does not mean alignment = 1. Again I would prefer to not
> > > extend this mail further with explaining why I believe giving unaligned
> > > pointers concrete alignment numbers is not a good idea, but we can
> > > discuss this further in some follow up if anyone wishes to.
> >
> > I'd like to hear your reasoning for being against concrete values, yeah.
> > Either way, @unaligned would be better than what we have now, though I
> > think it should be possible to cast between *@unaligned T and *T, since
> > requiring a uintptr cast makes the cast more unsafe, since there's no
> > compiler check that the secondary type remains the same.
>
> Casting *T -> @unaligned *T should maybe work yeah. Though in my head,
> an implementation declaring the alignment of T to be 4 and alignment of
> @unaligned T be 8 is plain stupid but permitted by the spec. There isn't
> really a reason for this to be so, we can make the spec require
> @unaligned to not increase alignment. And then we can have one way
> casts. Not the other way though, there is no compatibility that way,
> like trying to go from *i8 to *str, except that here we actually do have
> a way to prohibit this. I totally hear your concern about ensuring
> secondary type is the same, but in my opinion @unaligned *T -> *T is
> equally incorrect as *T1 -> *T2 for some random T1 and T2 semantically
> speaking, and the matching secondary type is not really relevant.

Hm, actually, I didn't think this through yesterday well enough. I see
now why casting @unaligned *T to *T should be allowed - because the
programmer might know through other means that the alignment matches. I
don't know what to do with this case then. It's simply an unsafe cast,
like any other, and arguably a useful one. If we're going to expand 'as'
functionality to generally mean safe casts, I guess this is one more
place where this would make sense? Pointer casts that verify alignment
requirements are satisfied? Maybe. But I'm still unconvinced @unaligned
pointers are really a good idea at all.
Reply to thread Export thread (mbox)