~sircmpwn/hare-rfc

10 6

[RFC v1] removing nullable and the special case for null from the language

Lorenz (xha) <me@xha.li>
Details
Message ID
<Zg4_wDydJz0BVuMw@xha.li>
DKIM signature
missing
Download raw message
                              RFC SUMMARY

in my opionion, nullable pointers and "null" is a very weird language
feature: it is constantly in my way when i am trying to interact
with C. they are never useful outside of interacting with C. i
actively avoid using nullable pointer types to not obscure my code.
having to check for null, *even though i know* that based on logic,
a pointer is never null, is annoying. it isn't helped by the fact
that there is no "non-null type assertion".

i think that it doesn't even prevent these kind of logic bugs: here
is what would happen when i think that a C function, with a certain
combination of arguments, never returns null, but it does return
null:

	```
	match (x(...)) {
	case null => abort();
	};
	```
	hare run main.ha
	Abort: main.ha[...] assertion failed

without nullable pointers:

	```
	let val = x(...);
	*val;
	```
	hare run main.ha
	Illegal pointer access (address not mapped to object) at address 0x0
	Abort trap (core dumped)

it results in the same end result: we crash! in both cases, debug::
can print a backtrace or you can use the core dump to get a backtrace.
even though nullable pointers have been "promoted" as beeing a security
feature, null pointer dereferences are not security-relevant.

there is another argument to be had: nullable pointers are better for
performance than (*T | void). three things about this:

	(1) hare generally prefers simplicity over performance.
	(2) performance improvement is extremly small.
	(3) null is not removed from the language. nothing prevents you
	    from using it if you need it's performance for some reason.

there is no such invariant in hare that normal pointers always point
to something - this is misleading: heap-allocated pointers can be
freed and even pointers to stack memory can be invalid. in fact,
this makes this problem even worse because when working with C
datastructures, i cannot use something like (*T | void). i am either
forced by hare to use nullable or use a language hack. after using
rust for more than a year or so, i've become tired of languages
trying to enforce some """fucked-up""" invariants. one reason i
like hare so much is that it is actually not trying to do that but
help me, with stuff like defer or tagged unions. nullable and null
assignability are the only exception here.

also, null is not a type: null is a value, and we are obscuring
that too by having it be both. it's confusing for users. i've already
seen a bunch of people complain about it. nullable pointers are the
only special case in the language except tagged unions where you
can match on them and even use the "is" operator. i guess it would
even make sense to allow "as".

nullable pointers and the null "type and value" are also not very
nice to handle in implementations either. "nullable" appears 93
times in the specification. it get's even more complicated if we
would add the "as" operator to nullable pointers.

nullable pointers can mostly be replaced with (*T | void), when used in
hare code, to signal the absense of a pointer or it's variable. it's
much better integrated with the rest of the language anyways.

                         LANGUAGE IMPLICATIONS

nullable is removed.

                     STANDARD LIBRARY IMPLICATIONS

not a whole lot of code is using nullable pointers anyways, so
converting isn't hard.

                         ECOSYSTEM IMPLICATIONS

probably breaks most code but not very hard to fix in most cases.

                              RELATED RFCS

will send an RFC in about handling allocation failure in insert,
append and alloc, in the same way, that doesn't include nullable.
Details
Message ID
<D0B48MOIT958.3DQEALPAP182K@sebsite.pw>
In-Reply-To
<Zg4_wDydJz0BVuMw@xha.li> (view parent)
DKIM signature
pass
Download raw message
Very strong -1. I agree that nullable pointers are annoying to work
with, but I think they're worth keeping, but possibly adding some
language features to make them easier to use.

On Thu Apr 4, 2024 at 1:50 AM EDT, Lorenz (xha) wrote:
>                               RFC SUMMARY
>
> in my opionion, nullable pointers and "null" is a very weird language
> feature: it is constantly in my way when i am trying to interact
> with C. they are never useful outside of interacting with C. i
> actively avoid using nullable pointer types to not obscure my code.
> having to check for null, *even though i know* that based on logic,
> a pointer is never null, is annoying. it isn't helped by the fact
> that there is no "non-null type assertion".

IMO we should add a non-null type assertion (I like the idea of reusing
postfix ! for this purpose)

> i think that it doesn't even prevent these kind of logic bugs: here
> is what would happen when i think that a C function, with a certain
> combination of arguments, never returns null, but it does return
> null:
>
> 	```
> 	match (x(...)) {
> 	case null => abort();
> 	};
> 	```
> 	hare run main.ha
> 	Abort: main.ha[...] assertion failed
>
> without nullable pointers:
>
> 	```
> 	let val = x(...);
> 	*val;
> 	```
> 	hare run main.ha
> 	Illegal pointer access (address not mapped to object) at address 0x0
> 	Abort trap (core dumped)
>
> it results in the same end result: we crash! in both cases, debug::
> can print a backtrace or you can use the core dump to get a backtrace.
> even though nullable pointers have been "promoted" as beeing a security
> feature, null pointer dereferences are not security-relevant.

The spec doesn't mandate this; a null pointer dereference is undefined
behavior. This doesn't necessarily mean that implementations will
aggressively optimize them out, but with nullable pointers it's
*guaranteed* to abort.

> there is another argument to be had: nullable pointers are better for
> performance than (*T | void). three things about this:
>
> 	(1) hare generally prefers simplicity over performance.

Nullable pointers are definitely a "simple" feature, so this argument
doesn't really hold up

> 	(2) performance improvement is extremly small.

It varies based on the codebase and the context and all that, but I'm
less concerned about the performance impact of matching, and more about
the amount of extra space (*T | void) takes up. This means more cache
misses when reading from memory, which can have a measurable performance
impact.

> 	(3) null is not removed from the language. nothing prevents you
> 	    from using it if you need it's performance for some reason.

> there is no such invariant in hare that normal pointers always point
> to something - this is misleading: heap-allocated pointers can be
> freed and even pointers to stack memory can be invalid.

This might not be the case in the future, if we manage to get linear
types (or some other memory safety mechanism) working. This is
hypothetical, of course, but if we do something like this then nullable
pointers fit very well into that model.

> this makes this problem even worse because when working with C
> datastructures, i cannot use something like (*T | void). i am either
> forced by hare to use nullable or use a language hack. after using
> rust for more than a year or so, i've become tired of languages
> trying to enforce some """fucked-up""" invariants. one reason i
> like hare so much is that it is actually not trying to do that but
> help me, with stuff like defer or tagged unions. nullable and null
> assignability are the only exception here.

> also, null is not a type: null is a value, and we are obscuring
> that too by having it be both. it's confusing for users. i've already
> seen a bunch of people complain about it. nullable pointers are the
> only special case in the language except tagged unions where you
> can match on them and even use the "is" operator. i guess it would
> even make sense to allow "as".

Null is a type. This isn't unique to Hare; the majority of languages
which I'm aware of that have a concept of "null" treat it internally as
a distinct type. Even C as of C23 has nullptr_t. Having null be its own
type is just an elegant way to implement it without needing to special
case things.

I haven't seen anyone complain that null is(n't?) a type? Where have you
seen this?

> nullable pointers and the null "type and value" are also not very
> nice to handle in implementations either. "nullable" appears 93
> times in the specification. it get's even more complicated if we
> would add the "as" operator to nullable pointers.

As far as language features go, it's really not that difficult to
handle, especially given the safety benefits (which I think are worth
it, even if they're currently annoying to deal with).

> nullable pointers can mostly be replaced with (*T | void), when used in
> hare code, to signal the absense of a pointer or it's variable. it's
> much better integrated with the rest of the language anyways.

For the reasons I mentioned above, (*T | void) is almost always the
wrong thing to do IMO. Given that null is a thing that exists (even if
we remove nullable pointers as you propose), it's very wasteful to not
use it.

>                               RELATED RFCS
>
> will send an RFC in about handling allocation failure in insert,
> append and alloc, in the same way, that doesn't include nullable.

I don't think there's an actual RFC for this since this proposal
pre-dates the RFC process, but AFAIK the plan was to add a builtin nomem
error type, and then require using postfix ! with alloc to explicitly
abort on allocation failure (or use postfix ? to propagate the error,
which library functions should do).
Details
Message ID
<D0B7TXOWJ9RU.357GK9XJ5AWZR@cmpwn.com>
In-Reply-To
<D0B48MOIT958.3DQEALPAP182K@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
On Thu Apr 4, 2024 at 8:14 AM CEST, Sebastian wrote:
> Very strong -1. I agree that nullable pointers are annoying to work
> with, but I think they're worth keeping, but possibly adding some
> language features to make them easier to use.

Agreed, strong NACK on removing nullable, but I'd love to see some QoL
improvements.

> IMO we should add a non-null type assertion (I like the idea of reusing
> postfix ! for this purpose)

+1 for ! being used to assert non-null.

Would it make sense for ? to cause the enclosing function to return null?

> It varies based on the codebase and the context and all that, but I'm
> less concerned about the performance impact of matching, and more about
> the amount of extra space (*T | void) takes up. This means more cache
> misses when reading from memory, which can have a measurable performance
> impact.

Quite so -- and also consider the ABI implications. Nullable pointers
only require one register, and (*T | void) always requires two.
Registers are a finite resource and that can have a big impact on
performance.
Lorenz (xha) <me@xha.li>
Details
Message ID
<Zg5zycisFLW2YBwS@xha.li>
In-Reply-To
<D0B48MOIT958.3DQEALPAP182K@sebsite.pw> (view parent)
DKIM signature
missing
Download raw message
as far as i can understand this, your arguments are esentially?:

- nullable pointers are annoying to work with but worth keeping
- there should be more features in the language to make them easier to use
- the spec says that null pointer deferences are undefined behavior
- nullable is garanteed to abort
- nullable pointers are a simple feature
- linear types
- mijority of languages treat null as a type

----

in my personal opinion, i think that there is no way that nullable
pointers are not annoying. neither are they are simple feature. let me
define that:

- they define a lot of special cases; specifically, in:
	- pointer types (obviously)
	- indexing
	- slicing
	- appending
	- allocations
	- dereferencing
	- type assertions
	- match expressions
	- assignment
	- result type reduction
	- type promotion
	- function calls
	- mesurements, i.e., align(), size(), len(), offset()
	- field access

to fix this, let's define even more special cases (features)?

i think that null pointer deferences could be somehow made defined
behavior.  except for some embedded boards, probably every computer
today is capable of aborting on null pointer dereferences.

regarding linar types: as yyp already said on IRC (afaik), it's
unlikely that there is a design that fits hare. i am also very
much not happy about linear types. anyways, i don't think this
is an argument given that they don't exist.

i am also not saying that null should be removed! this is just
about nullable - i think that it's better to accept you have
to trust the programmer to make pointers point to valid memory
than having some weird stuff that prevents null pointers but
doesn't prevent all other cases.

as i already explained (at least tried), nullable pointers don't
prevent logic errors - the logic *will* be the same, regardless if
we have them or not: crashing. this will be true even moreso when
we add a non-null assertion.

i don't think that the benifits outweigh the problems that come
with nullable pointers:

	- more stuff to keep in your head as aprogrammer
	- annoying to work with
	- addtional complexity for implementations
	- it's the only thing where hare doesn't trust the programmer
	  for some reason
Lorenz (xha) <me@xha.li>
Details
Message ID
<Zg53DxzB29cdPL6R@xha.li>
In-Reply-To
<D0B48MOIT958.3DQEALPAP182K@sebsite.pw> (view parent)
DKIM signature
missing
Download raw message
> > also, null is not a type: null is a value, and we are obscuring
> > that too by having it be both. it's confusing for users. i've already
> > seen a bunch of people complain about it. nullable pointers are the
> > only special case in the language except tagged unions where you
> > can match on them and even use the "is" operator. i guess it would
> > even make sense to allow "as".
> 
> Null is a type. This isn't unique to Hare; the majority of languages
> which I'm aware of that have a concept of "null" treat it internally as
> a distinct type. Even C as of C23 has nullptr_t. Having null be its own
> type is just an elegant way to implement it without needing to special
> case things.
> 
> I haven't seen anyone complain that null is(n't?) a type? Where have you
> seen this?

sometime ago but it was in either #hare or #hare-soc

> > will send an RFC in about handling allocation failure in insert,
> > append and alloc, in the same way, that doesn't include nullable.
> 
> I don't think there's an actual RFC for this since this proposal
> pre-dates the RFC process, but AFAIK the plan was to add a builtin nomem
> error type, and then require using postfix ! with alloc to explicitly
> abort on allocation failure (or use postfix ? to propagate the error,
> which library functions should do).

yeah basicly i was just going to pick up that stuff from autumnull@ again
Details
Message ID
<D0B8T833SVPZ.4X52UTTSVLT6@cmpwn.com>
In-Reply-To
<Zg5zycisFLW2YBwS@xha.li> (view parent)
DKIM signature
pass
Download raw message
The problem is that null pointers are (1) real and (2) invalid.
dereferencing them is an error. Hare has mandatory error handling
enforced by the type system, therefore the type system should account
for invalid pointers.
Lorenz (xha) <me@xha.li>
Details
Message ID
<Zg59QEq85gwMskYj@xha.li>
In-Reply-To
<D0B8T833SVPZ.4X52UTTSVLT6@cmpwn.com> (view parent)
DKIM signature
missing
Download raw message
On Thu, Apr 04, 2024 at 11:49:28AM +0200, Drew DeVault wrote:
> The problem is that null pointers are (1) real and (2) invalid.
> dereferencing them is an error. Hare has mandatory error handling
> enforced by the type system, therefore the type system should account
> for invalid pointers.

i understand your arguments but not your conclusion; i don't think
that nullable pointers are the right way to account for invalid
pointers. i think that the type system should rather not account
for them at all instead of some language feature that we can agree
is more or less annoying.
Details
Message ID
<D0BD7AG4ZQWE.RVAJU0D7IAGW@attila>
In-Reply-To
<Zg4_wDydJz0BVuMw@xha.li> (view parent)
DKIM signature
pass
Download raw message
Hi, thanks for writing this up. I'm mostly against the removal, but I think
it's important that we discuss it.

>                               RFC SUMMARY
>
> in my opionion, nullable pointers and "null" is a very weird language
> feature: it is constantly in my way when i am trying to interact
> with C. they are never useful outside of interacting with C. i

Can you give a concrete example where they are in your way?

> actively avoid using nullable pointer types to not obscure my code.
> having to check for null,
> *even though i know* that based on logic,
> a pointer is never null, is annoying. it isn't helped by the fact
> that there is no "non-null type assertion".

In contexts that are not relevant for C compatibility, is that really much
different than having to check for (void | *ptr) not being null?  Can you give
an example of this? At least in theory nullable pointers are not supposed to be
used where you're able to tell the pointer is not going to be null.

> i think that it doesn't even prevent these kind of logic bugs: here
> is what would happen when i think that a C function, with a certain
> combination of arguments, never returns null, but it does return
> null:

"with a certain combination of arguments" is doing a lot of heavy lifting here.
Yes, there are functions whose contracts are complicated enough that ensuring
they're called correctly would require a type system strong enough to handle
mathematical theorems about the arguments. In that case nullable pointers can't
help much, but then again neither can anything else mainstream programming
languages do. This kind of function would then be a good candidate for a
hare-specific wrapper that makes working with it a bit easier or idiomatic.
(And that wrapper is inevitably going to be ugly and tedious and probably not
the best thing performance-wise).

A lot of functions are not like that, and a simple nullable/non-nullable
pointer distinction can help a lot there. The amount of type safety that is put
in the calling contract is never a binary choice.

> it results in the same end result: we crash! in both cases, debug::
> can print a backtrace or you can use the core dump to get a backtrace.
> even though nullable pointers have been "promoted" as beeing a security
> feature, null pointer dereferences are not security-relevant.

+1, we shouldn't talk about them as a security feature, because at least in
the narrow sense of security, they're are not.

>
> there is another argument to be had: nullable pointers are better for
> performance than (*T | void). three things about this:
>
> 	(1) hare generally prefers simplicity over performance.
> 	(2) performance improvement is extremly small.
> 	(3) null is not removed from the language. nothing prevents you
> 	    from using it if you need it's performance for some reason.

This last point is not good at all, it goes directly against hare's stance
towards safety/usability/performance tradeoffs.

> there is no such invariant in hare that normal pointers always point
> to something - this is misleading: heap-allocated pointers can be
> freed and even pointers to stack memory can be invalid. in fact,
> this makes this problem even worse because when working with C
> datastructures, i cannot use something like (*T | void). i am either
> forced by hare to use nullable or use a language hack. after using
> rust for more than a year or so, i've become tired of languages
> trying to enforce some """fucked-up""" invariants. one reason i
> like hare so much is that it is actually not trying to do that but
> help me, with stuff like defer or tagged unions. nullable and null
> assignability are the only exception here.

Every invariant any programming language has ever attempted to enforce is
fundamentally fucked up in a sense, and that's not just because humans are bad
at designing languages, it's a deep mathematical fact. The only thing that we
can do about it is finding better tradeoffs between performance, usability and
safety. C found a nice spot on those axes, but new systems languages are made
regularly, because people believe they can do better. I guess here the question
is where on these axes do we want Hare to be. I'd personally yeet defer out of
hare much sooner than nullable pointers, and it's definitely possible to
construct examples where tagged unions are a nuisance rather than a tool, but
ultimately I think those are all features suitable for the language I want Hare
to be.

> also, null is not a type: null is a value, and we are obscuring
> that too by having it be both. it's confusing for users. i've already
> seen a bunch of people complain about it. nullable pointers are the
> only special case in the language except tagged unions where you
> can match on them and even use the "is" operator. i guess it would
> even make sense to allow "as".

But we already do allow 'as' on nullable pointers.

> nullable pointers and the null "type and value" are also not very
> nice to handle in implementations either. "nullable" appears 93
> times in the specification. it get's even more complicated if we
> would add the "as" operator to nullable pointers.

The issue of null as a type is somewhat complicated to resolve formally
(and I personally don't agree with seb and ecs and also don't think C23 is a
good precedent), but this doesn't have practical implications for this
RFC in my view, so I'd prefer to discuss it elsewhere.

> nullable pointers can mostly be replaced with (*T | void), when used in
> hare code, to signal the absense of a pointer or it's variable. it's
> much better integrated with the rest of the language anyways.

I think this should be resolved by integrating nullable pointers better, not by
removing them.
Details
Message ID
<D0BGT9KPIJRL.1J85E65WQWGQU@DebianDreams>
In-Reply-To
<Zg5zycisFLW2YBwS@xha.li> (view parent)
DKIM signature
missing
Download raw message
On Thu Apr 4, 2024 at 4:32 AM CDT, Lorenz (xha) wrote:
> i think that null pointer deferences could be somehow made defined
> behavior.  except for some embedded boards, probably every computer
> today is capable of aborting on null pointer dereferences.

through indexing or field access, dereferencing a null pointer can
result in a read/write at theoretically any memory address. whether or
not this is likely to be a problem depends on the architecture, but the
only way to provide this as a guarantee... is to generate assertions.
Details
Message ID
<e30d0a87-3575-474d-b318-75234ff3f7f7@yahoo.com>
In-Reply-To
<D0BD7AG4ZQWE.RVAJU0D7IAGW@attila> (view parent)
DKIM signature
permerror
Download raw message
if I understood everything correctly, the main complains are:

> in my opionion, nullable pointers and "null" is a very weird language
> feature: it is constantly in my way when i am trying to interact with
> C. they are never useful outside of interacting with C. i actively
> avoid using nullable pointer types to not obscure my code. having to
> check for null, *even though i know* that based on logic, a pointer
> is never null, is annoying. it isn't helped by the fact that there is
> no "non-null type assertion".

null/nullable is said to be only useful when dealing with C and even in
these cases it's annoying. outside of the c FFI, it's *just* annoying.

> also, null is not a type: null is a value, and we are obscuring that 
> too by having it be both. it's confusing for users.

null is the only valid value of a type with the sole purpose of
indicate an invalid pointer, which is pointless once "this is
misleading" and "confusing for users."

> nullable pointers can mostly be replaced with (*T | void), when used 
> in hare code, to signal the absense of a pointer or it's variable. 
> it's much better integrated with the rest of the language anyways.

no practical need for nullable type once we can express this kind of
behavior using tagged unions.

ok, all this stated, I would like to break in parts:

hare is not quite an experimental language. it does not, as far as I
am concerned, that it tries to "reinvent the wheel" or be a "big
agenda language." it doesn't link against libc not to cut ties with C,
but to keep it portable. nevertheless, it does support FFI with C almost
natively. just by that, we already got a good reason to keep null as
part of the primitive types because it avoids over-complicated solutions
to something not that of a big deal and keeps the ABI compatibility.

keeping as it's very own type, in my opinion, keeps two things clear and
linked: pointers can fail and lead to errors and this error is not
code-dependent. with that I mean that null pointers are naturally
"undefined-behaviory" in the sense it can occur in many situations for a
lot of reasons. this special kind of failure, using a special kind of
type, naturally leads to the ideas of native error assertion/propagation
directly to the type. in contrast with a web socket that can fail only
on the request, it is very possible to a pointer become null during
runtime due to memory leak or corruption or hardware failure. enforcing
the safety in an "annoying way", for me, is just perfect for an annoying
king of error. it forces you to take in count these failures semantically.

again, simplicity; we already got null in the language because it's
already a thing and part of the ABI, trying to replace it with tagged
unions implies not only in a noticeable overhead on developing (and
maintenance after a hypothetical removal) but also in performance and
the simplicity rule. no need to over-complicate over an already
complicated issue with programming itself.
Lorenz (xha) <me@xha.li>
Details
Message ID
<ZhE4FVW_7bMiquX-@xha.li>
In-Reply-To
<Zg4_wDydJz0BVuMw@xha.li> (view parent)
DKIM signature
missing
Download raw message
i've thought about this a lot more the last couple of days and came
to the conclusion that this is a bad idea. i was not able to come
up with any better proposal and i think this is unfortunately the
best way to handle references that can be null. i am not sure if
we should even have the "!" operator, given that "as" is a thing.
but for completeness and ease of use we should probaly have it.

thanks to Seb, Drew, Bor and Mikaela for taking the time and
convincing me that this is a bad idea :)
Reply to thread Export thread (mbox)