~sircmpwn/hare-rfc

3 2

[RFC v1] Tagged union overhaul

Details
Message ID
<D6BSFDST952H.1YFDBK4I9QEYG@disroot.org>
DKIM signature
pass
Download raw message
                              RFC SUMMARY

Tagged unions are the flagship feature of Hare, however their
implementation isn't perfect. In their current state, they have a lot
of issues from both user's and compiler developer's perspective.

On the user side, tagged union assignability semantics can be confusing
and create unexpected results. There are no well written rules for how
they work.

On the compiler side, parts of harec that work with tagged unions are
quite complex and interconnected, making even small fixes difficult
without breaking something else. The existing infrastructure does not
have the tools to implement much needed features such as exhaustivity
checking in match expressions. Not to mention that there are lots of
corner cases that can cause harec to crash in certain configurations.

This proposal aims to provide a specification for tagged unions that
addresses these issues and can serve as a base for the Hare language
specification and a new implementation in harec.

                      TAGGED UNION REPRESENTATION

Tagged union types are written as (m[0] | m[1] | ... | m[N]) where each
member m[i] is (["..."] type).

The tagged union type can be represented internally by a list of its
member types.

Tagged unions no longer have implicit squishing, so using an unnamed
tagged union as a member of a different union doesn't merge members into
the parent. Instead, the (...type) syntax is used to unwrap tagged
unions, replacing the alias unwrapping operator. The type following the
ellipsis must dealias to a tagged union.

All member types must be distinct. Duplicates that are directly
mentioned in the union or appear due to the unwrapping operator must
raise a compile error. In my opinion, it never makes sense to have
duplicate members and no real-world code does outside of the example to
show this feature. It could be potentially useful for auto-generated
code but I'm not convinced.

Algebraic properties of tagged unions:

- They are commutative: the order of members doesn't affect the union.
- They are *not* associative: since there's no implicit squishing,
  (a | (b | c)) != ((a | b) | c)
- They are *not* reductive: duplicate members are no longer allowed.

Examples:

Note: set notation is used here since the internal order of member types
is implementation-defined.

- (int | bool)           => {int, bool}
- (int | int)            => fail - duplicate member
- (int | (u32 | u64))    => {int, (u32 | u64)}
- (int | (int | u64))    => {int, (int | u64)}
- (int | ...(u32 | u64)) => {int, u32, u64}
- (int | ...(u32 | int)) => fail - duplicate member due to unwrapping
- (int | ...bool)        => fail - bool is not a tagged union

Pseudocode for tagged union construction:

    fn insert_member(members: *[]type, next: type) void = {
    	if members contains next {
    		error("Duplicate member in a tagged union");
    	};
    	append(members, next);
    };
    fn create_union(union: [](unwrapping | type)) []type = {
    	let collected_members: []type = [];
    	for (let member .. union) {
    		match (member) {
    		case let unwrap_type: unwrapping =>
    			let source = dealias(unwrap_type);
    			match (source) {
    			case let members: tagged_union =>
    				for (let soure_member .. members) {
    					insert_member(
    						&collected_members,
    						source_member,
    					);
    				};
    			case =>
    				error("Cannot unwrap this type");
    			};
    		case let member_type: type =>
    			insert_member(&collected_members, member_type);
    		};
    	};
    	return collected_members;
    };

A more interesting and useful representation of tagged unions is a tree
where leaves are subtypes and inner nodes are tagged unions.

For example, (int | (bool | void)) will be represented by the following
tree:

(int | (bool | void))
- int
- (bool | void)
  - bool
  - void

Note that leaf nodes are never tagged unions or aliases of tagged
unions.

There is a unique sequence of edges from the root to each subtype,
called the subtype's canonical path. It can be written as a chain
of cast operations that result in the construction of that subtype, for
example: (bool: (bool | void): (int | (bool | void))).

Direct descendants of a tagged union are child nodes of the tagged union
that are directly connected by an edge in the graph. All descendants of
a tagged union are the entire subtree of nodes starting at the tagged
union.

For example, direct descendants of (int | (bool | void)) are int and
(bool | void), however all descendants also include bool and void.

                           SUBTYPE SELECTION

Subtype selection is the process which allows to resolve a type into
a tag that would be stored into the tagged union value.

Subtype selection is primarily used in two places that have different
requirements - tagged union assignability and match.

Tagged union assignability should allow assignability-based matches, for
example {let x: ([]int | void) = [1, 2, 3];} should work since an array
can be assigned to a slice. However, match should not allow those since
they are difficult to recover from and essentially introduce pattern
matching, which is not the goal of this proposal.

Despite not using assignability-based selection, match can still
select subtypes deep inside the tagged union tree, as long as they
match exactly.

No matter whether assignability-based matches are allowed, subtype
selection should follow a few requirements:

1. All tagged union members must have a valid canonical path.
2. An exact type match is always preferred over a non-exact match.
3. Non-exact matches are only allowed if it's not ambiguous what member
   is to be selected. For example, {let x: (bool | u16) = 1u8;} is
   allowed because u8 may only convert to u16 and not bool, but
   {let x = (u16 | u32) = 1u8;} is disallowed because it's unclear what
   subtype u8 converts to.

Besides single subtypes, subsets can also be selected. A subtype may
only be selected from direct descendants of one tagged union. So for
example, {x: (int | void): (int | (void | bool))} is not allowed. This
way constructing or extracting a subtype only requires copying the tag
and data of one tagged union into the other (or even without a copy
when using a pointer-based match). Otherwise, it might be required to
completely reconstruct the tagged union, which is expensive and
infeasible for pointer matches.

The result of subtype selection is a set of matched subtypes as well
as the canonical path to their common parent. For example, selecting
void from (int | (bool | void)) should produce ((bool | void), {void}).
Concatenating two paths will give the canonical path to the member.

The algorithm for subtype selection that accepts a tagged union
("the union") and a type to be selected ("the request") as well as a
boolean flag that enables assignability-based matches ("fuzzy mode"):

1. If the request shows up in direct descendants of the union, return
   that subtype.
2. If the request shows up exactly once in all descendants of the union,
   return that subtype and a path to its parent.
3. If the request shows up more than once in all descendants of the
   union, report an ambiguity error.
4. If fuzzy mode is enabled:
   4.1. If the request is assignable to exactly one subtype in all
        descendant leaf nodes of the union, return that subtype and a
        path to its parent.
   4.2. If the request is assignable to more than one subtype in all
        descendant leaf nodes of the union, report an ambiguity error.
6. If the request is a tagged union, try to match a subtype:
   6.1. If direct descendants of the union contain all direct
        descendants of the request, return all members of the request.
   6.2. If the request is a subtype of exactly one inner nodes in the
        tagged union tree (same check as 6.1), return all members of
        the request and a path to that inner node.
   6.3. If the request is a subtype of more than one inner nodes in the
        tagged union tree (same check as 6.1), report an ambiguity
        error.
7. None of the previous steps found anything, report an invalid subtype
   error.

The first rule ensures that an exact match is always preferred, and also
makes the cast notation of canonical paths be valid expressions that
can construct a corresponding subtype. This passes the first
requirement.

Looking at the order, it's also obvious that exact matches are preferred
over non-exact matches. This passes the second requirement.

The algorithm also explicitly checks for ambiguity in all cases which
passes the third requirement.

An important note is that 4.1 and 4.2 only check leaf nodes. Removing
that restriction will not break the algorithm, however it will cause
assignment to be performed in multiple iterations of subtype selection,
which is not ideal.

                           MATCH EXHAUSTIVITY

Match expressions in Hare currently lack a major feature - exhaustivity
checking, or in other words, don't verify that the programmer has
handled all possible tags. Instead, the program crashes if an unhandled
tag is encountered.

The current system in harec isn't structured in a way that will make
this feature easy to implement.

With the system proposed here, it's very easy to implement exhaustivity
checking.

The main data structure at play is a "match tree". A match tree has the
same structure as a tree representation of a tagged union except that
each node also contains a flag that indicates whether the node has been
covered by a branch of a match expression.

Adding a branch to a match tree is quite straightforward:
1. Perform subtype selection with fuzzy mode disabled
2. For all candidate results:
   2.1. Match the corresponding tagged union tree mode to the match
        tree node.
   2.2. If the match tree node is already marked as covered, report a
        duplicate branch error.
   2.3. Otherwise, mark the node and all its descendants as covered.
3. Fix-up flags of upper nodes: if all direct descendants of a node are
   marked as covered, the node itself gets marked as covered.

After adding all the branches through this method, the root's node
coverage status can be checked to see if the match expression is
exhaustive. The compiler can also look at statuses of other nodes to
list out all cases which were not handled.

This tree structure also allows to make code generation for match
expressions a lot simpler and more efficient. Instead of generating
each branch separately with a corresponding tag check (which can require
multiple comparisons for nested tagged unions), tag comparisons can be
performed only once per edge in the match tree.

Another important milestone is that the match tree provides the tools
necessary to implement the catch-all binding branch ("case let x =>").
The correct semantics for it are surprisingly complicated so it warrants
a separate discussion in a different RFC.

The new match tree-based approach has a major difference from the
current implementation: it's no longer possible to match one node from
different branches. This can be useful to match on a specific error and
then have a generic error handler later, such as:

    match (os::open(...)) {
    case let file: io::file => ...;
    case errors::noentry =>
    	// do something when file doesn't exist...
    	void;
    case let err: fs::error =>
    	return err;
    };

This will stop working since errors::noentry and fs::error cases share
a subtype, which can't be properly represented in the match tree. An
alternative way of accomplishing this is using a seondary match or
an "is"-check:

    match (os::open(...)) {
    case let file: io::file => ...;
    case let err: fs::error =>
    	if (err is errors::noentry) {
    		// do something when file doesn't exist...
    	} else {
    		return err;
    	};
    };

Ember also proposed an operator that could exclude certain subtypes from
a subset match, such as "case let err: fs::error - errors::noentry".
In my opinion, this feature doesn't provide much benefit over a nested
match since the pattern is relatively rare. However, the discussion for
adding it would be welcome in a separate RFC if others disagree. The
implementation should be relatively straightforward.

                         LANGUAGE IMPLICATIONS

The alias unwrapping operator is moved into the tagged union grammar:

 storage-class:
-	unwrapped-alias
-unwrapped-alias:
-	"..." identifier
 tagged-types:
-	type "|" type ["|"]
-	type "|" tagged-types
+	tagged-member "|" tagged-member ["|"]
+	tagged-member "|" tagged-types
+tagged-member:
+	type
+	"..." type

"Assignment", "Casts, type assertions, and type tests",
"Match expressions" and "Tagged union types" sections of the
specification are updated to accomodate new tagged union semantics. A
proper algorithm for subtype selection is also added.

                     STANDARD LIBRARY IMPLICATIONS

None. To my knowledge, the standard library doesn't rely on current
squishing behaviour or any other specific parts of tagged union
semantics in public APIs.

                         ECOSYSTEM IMPLICATIONS

The overhaul of tagged union semantics is unlikely to have a serious
effect on most Hare code in the wild.

However, match statements that have overlapping branch subsets have to
be updated (notable mentions include hare::module and cmd/hare), and so
do inexhaustive match statements. It's also possible that code dependent
on current assignability semantics will fail to compile after new
stricter rules are in place.

                              RELATED RFCS

None
Details
Message ID
<D6BTLY6VOFYU.A5WQRKX0ZJ25@sebsite.pw>
In-Reply-To
<D6BSFDST952H.1YFDBK4I9QEYG@disroot.org> (view parent)
DKIM signature
pass
Download raw message
Thanks for writing this up! Generally +1 with some revisions. Lots of
comments inline:

On Sat Dec 14, 2024 at 5:47 PM EST, Alexey Yerin wrote:
> Tagged unions no longer have implicit squishing, so using an unnamed
> tagged union as a member of a different union doesn't merge members into
> the parent. Instead, the (...type) syntax is used to unwrap tagged
> unions, replacing the alias unwrapping operator. The type following the
> ellipsis must dealias to a tagged union.

+1

> All member types must be distinct. Duplicates that are directly
> mentioned in the union or appear due to the unwrapping operator must
> raise a compile error. In my opinion, it never makes sense to have
> duplicate members and no real-world code does outside of the example to
> show this feature. It could be potentially useful for auto-generated
> code but I'm not convinced.

Eh, sure. I guess that makes sense.

> A more interesting and useful representation of tagged unions is a tree
> where leaves are subtypes and inner nodes are tagged unions.

Inner nodes can also be aliases of tagged unions. This is sorta a
nitpick, but I think it's important, since we're nailing down alias
semantics here.

Also, what about error types? (Working under the assumption that we get
rid of type flags and make error types distinct types, since it makes
some stuff simpler.) The type !tagged_union, as well as aliases of that
type, should be valid inner nodes.

> Tagged union assignability should allow assignability-based matches, for
> example {let x: ([]int | void) = [1, 2, 3];} should work since an array
> can be assigned to a slice. However, match should not allow those since
> they are difficult to recover from and essentially introduce pattern
> matching, which is not the goal of this proposal.

+1. I know some people (notably Ember) want to look in to pattern
matching, but I don't think that should be a blocker for this proposal.

> 1. All tagged union members must have a valid canonical path.
> 2. An exact type match is always preferred over a non-exact match.
> 3. Non-exact matches are only allowed if it's not ambiguous what member
>    is to be selected. For example, {let x: (bool | u16) = 1u8;} is
>    allowed because u8 may only convert to u16 and not bool, but
>    {let x = (u16 | u32) = 1u8;} is disallowed because it's unclear what
>    subtype u8 converts to.

+1, with the caveat that the third point should disallow stuff like u16
-> (u32 | alias_of_u32). Allowing assignability to aliases would
disallow this. If we disallow such assignability, then the third point
will need to be modified.

> Besides single subtypes, subsets can also be selected. A subtype may
> only be selected from direct descendants of one tagged union. So for
> example, {x: (int | void): (int | (void | bool))} is not allowed. This
> way constructing or extracting a subtype only requires copying the tag
> and data of one tagged union into the other (or even without a copy
> when using a pointer-based match). Otherwise, it might be required to
> completely reconstruct the tagged union, which is expensive and
> infeasible for pointer matches.

So, this is kinda meh, but maybe it's the best option. I have a few
thoughts though:

- Should it be possible for a match case to select direct descendents
  from different tagged unions, if a binding isn't created? That feels
  like it would be useful, and I don't think there's any technical
  reason it wouldn't be possible, though I'm not sure what the syntax
  would be. Something like `case type_a, type_b, type_c =>` would make
  sense (and be consistent with switch case syntax), but it might be
  confusing since it exists alongside the subset selection semantics.

- It would also be useful for match cases to be able to specify a
  *specific* descendent, if just the type alone would be ambiguous.
  Take, for example:

	type a = (int | void);
	type b = (int | str);
	let u: (a | b) = 0: a;
	match (u) {
	case let x: a->int =>
		// ...
	case => // ...
	};

  ...but with a better syntax. It also should work with type assertions
  and type tests.

> The algorithm for subtype selection that accepts a tagged union
> ("the union") and a type to be selected ("the request") as well as a
> boolean flag that enables assignability-based matches ("fuzzy mode"):
> [...]

> 1. If the request shows up in direct descendants of the union, return
>    that subtype.
> 2. If the request shows up exactly once in all descendants of the union,
>    return that subtype and a path to its parent.
> 3. If the request shows up more than once in all descendants of the
>    union, report an ambiguity error.
> 4. If fuzzy mode is enabled:
>    4.1. If the request is assignable to exactly one subtype in all
>         descendant leaf nodes of the union, return that subtype and a
>         path to its parent.
>    4.2. If the request is assignable to more than one subtype in all
>         descendant leaf nodes of the union, report an ambiguity error.
> 6. If the request is a tagged union, try to match a subtype:
>    6.1. If direct descendants of the union contain all direct
>         descendants of the request, return all members of the request.
>    6.2. If the request is a subtype of exactly one inner nodes in the
>         tagged union tree (same check as 6.1), return all members of
>         the request and a path to that inner node.
>    6.3. If the request is a subtype of more than one inner nodes in the
>         tagged union tree (same check as 6.1), report an ambiguity
>         error.
> 7. None of the previous steps found anything, report an invalid subtype
>    error.

This all mostly LGTM, except for a couple things.

For one, maybe we want to disallow the following assignment?:

	type a = (int | void);
	type b = (a | int);
	let x = if (true) 0 else void; // (int | void)
	let y: b = x; // ?

This is even moreso an issue in match. If I understand your proposed
algorithm correctly, a match case of `(int | void)` would be allowed,
and would return the members of `a`, *not* the direct `int` descendent
of `b`. And that feels sorta like an issue? Not sure how important of an
issue it is to fix though.

Additionally, error types need to be taken into consideration:

	type a = !(int | void | str);
	type b = (a | rune);
	let x: b = 0; // should be allowed, probably
	let y: b = 0: (int | void | str); // should be allowed, probably
	let z: b = 0: (int | void); // should be allowed, probably

A real world example of this would be assigning io::underread to
fs::error. io::underread is a direct descendent of io::error, which is a
direct descendent of fs::error, but io::error is an alias of an error
type.

We want to disallow e.g. assigning size to a tagged union containing
io::underread as a descendent, but that can be fixed by disallowing the
size -> io::underread assignment itself (either by disallowing
assignment to aliases, or, my preferred choice, disallowing conversion
to an error type unless the non-error type exactly matches).

> Another important milestone is that the match tree provides the tools
> necessary to implement the catch-all binding branch ("case let x =>").
> The correct semantics for it are surprisingly complicated so it warrants
> a separate discussion in a different RFC.

Yeah, after some thought I'm not even sure it's a good idea, because of
how complicated and unintuitive the semantics are.

> The new match tree-based approach has a major difference from the
> current implementation: it's no longer possible to match one node from
> different branches. This can be useful to match on a specific error and
> then have a generic error handler later, such as:
>
>     match (os::open(...)) {
>     case let file: io::file => ...;
>     case errors::noentry =>
>     	// do something when file doesn't exist...
>     	void;
>     case let err: fs::error =>
>     	return err;
>     };
>
> This will stop working since errors::noentry and fs::error cases share
> a subtype, which can't be properly represented in the match tree. An
> alternative way of accomplishing this is using a seondary match or
> an "is"-check:
>
>     match (os::open(...)) {
>     case let file: io::file => ...;
>     case let err: fs::error =>
>     	if (err is errors::noentry) {
>     		// do something when file doesn't exist...
>     	} else {
>     		return err;
>     	};
>     };

*Very strong* -1 to disallowing this. It's very useful to be able to
handle one error case specially, and other error cases separately,
without needing an extra level of indentation. And a default case
shouldn't be required for this.

This is one of the biggest reasons match exhaustivity is unsolved. If
it's just disallowed, I imagine exhaustivity could be implemented
without any of the changes in this RFC (albeit it would likely be more
complicated to implement).

This is a difficult problem to solve, but I do think we should actually
solve it. Not allowing this makes Correct error handling much more
difficult, and error handling is one place we want to minimize friction
as much as possible.

> Ember also proposed an operator that could exclude certain subtypes from
> a subset match, such as "case let err: fs::error - errors::noentry".
> In my opinion, this feature doesn't provide much benefit over a nested
> match since the pattern is relatively rare. However, the discussion for
> adding it would be welcome in a separate RFC if others disagree. The
> implementation should be relatively straightforward.

I'm pretty sure I was the one who originally proposed this lol, not that
it matters that much. I already brought up a benefit for something like
this earlier in this email (matching subtypes which would otherwise be
ambiguous), but this sounds like another good reason. Though I don't
think it's strictly required to implement match exhaustivity. The
subtype selection algorithm you proposed should be suitable; we just
need to allow a case to select a superset of subtypes when at least one
of them hasn't already been selected. The semantics might be a bit more
complicated, like, if we want to restrict this to aliases, and stuff
like that, but I do think it's solvable.
Details
Message ID
<D6C37FJ0BK8N.68Z17BZW5Q4C@turminal.net>
In-Reply-To
<D6BTLY6VOFYU.A5WQRKX0ZJ25@sebsite.pw> (view parent)
DKIM signature
pass
Download raw message
Hi,

thanks for writing this up, it looks like a huge improvement over the current
situation! I'll reply more thoroughly once I think through all
the details. For now I'd like to raise these points:

> > Tagged unions no longer have implicit squishing, so using an unnamed
> > tagged union as a member of a different union doesn't merge members into
> > the parent. Instead, the (...type) syntax is used to unwrap tagged
> > unions, replacing the alias unwrapping operator. The type following the
> > ellipsis must dealias to a tagged union.
>
> +1
>
> > All member types must be distinct. Duplicates that are directly
> > mentioned in the union or appear due to the unwrapping operator must
> > raise a compile error. In my opinion, it never makes sense to have
> > duplicate members and no real-world code does outside of the example to
> > show this feature. It could be potentially useful for auto-generated
> > code but I'm not convinced.
>
> Eh, sure. I guess that makes sense.

This isn't completely harmless. (...a | ...b) breaks with this
new rule if a and b share any member types, and at least at some point we did
have at least one such case in the stdlib.

> - It would also be useful for match cases to be able to specify a
>   *specific* descendent, if just the type alone would be ambiguous.
>   Take, for example:
>
> 	type a = (int | void);
> 	type b = (int | str);
> 	let u: (a | b) = 0: a;
> 	match (u) {
> 	case let x: a->int =>
> 		// ...
> 	case => // ...
> 	};
>
>   ...but with a better syntax. It also should work with type assertions
>   and type tests.

Provided we do come up with such a syntax, why bother with all the
complicated implicit rules if we can simply require every nested case to
have a full tree path written out? Sure it would be slightly more
verbose but that's a low price to pay compared to the complexity of
implicit rules (which are complex both from the perspective of a
language implementer and a language user). And if the verbosity becomes
too bad you can always just match on a subtree in a nested match (which
also has its own issues, but the point is, there is a choice).

Imo investing some thought in coming up with such a syntax would be well
worth it, though it may require significant changes to match.

For type assertions/tests this is already possible, you can just chain
as/is clauses.


> > The new match tree-based approach has a major difference from the
> > current implementation: it's no longer possible to match one node from
> > different branches. This can be useful to match on a specific error and
> > then have a generic error handler later, such as:
> >
> >     match (os::open(...)) {
> >     case let file: io::file => ...;
> >     case errors::noentry =>
> >     	// do something when file doesn't exist...
> >     	void;
> >     case let err: fs::error =>
> >     	return err;
> >     };
> >
> > This will stop working since errors::noentry and fs::error cases share
> > a subtype, which can't be properly represented in the match tree. An
> > alternative way of accomplishing this is using a seondary match or
> > an "is"-check:
> >
> >     match (os::open(...)) {
> >     case let file: io::file => ...;
> >     case let err: fs::error =>
> >     	if (err is errors::noentry) {
> >     		// do something when file doesn't exist...
> >     	} else {
> >     		return err;
> >     	};
> >     };
>
> *Very strong* -1 to disallowing this. It's very useful to be able to
> handle one error case specially, and other error cases separately,
> without needing an extra level of indentation. And a default case
> shouldn't be required for this.

In the currently proposed form allowing this is a bad idea. If I can't
tell at a glance which cases are "subcases" of one another, then such
subcases and dependence on their ordering should not exist and the cases
should be disjoint. With your explicit tree path idea however, this
would be very nice to have imo, and probably also less of a burden to
implement.
Details
Message ID
<D6EA9Q9NE8D7.LECLH368VJD3@sebsite.pw>
In-Reply-To
<D6C37FJ0BK8N.68Z17BZW5Q4C@turminal.net> (view parent)
DKIM signature
pass
Download raw message
On Sun Dec 15, 2024 at 2:14 AM EST, Bor Grošelj Simić wrote:
> > > All member types must be distinct. Duplicates that are directly
> > > mentioned in the union or appear due to the unwrapping operator must
> > > raise a compile error. In my opinion, it never makes sense to have
> > > duplicate members and no real-world code does outside of the example to
> > > show this feature. It could be potentially useful for auto-generated
> > > code but I'm not convinced.
> >
> > Eh, sure. I guess that makes sense.
>
> This isn't completely harmless. (...a | ...b) breaks with this
> new rule if a and b share any member types, and at least at some point we did
> have at least one such case in the stdlib.

When did it break? I couldn't think of any instance where this would be
an issue, and if there is a conflict, I think that it makes sense to
error out, since it may have been unintentional.

> > - It would also be useful for match cases to be able to specify a
> >   *specific* descendent, if just the type alone would be ambiguous.
> >   Take, for example:
> >
> > 	type a = (int | void);
> > 	type b = (int | str);
> > 	let u: (a | b) = 0: a;
> > 	match (u) {
> > 	case let x: a->int =>
> > 		// ...
> > 	case => // ...
> > 	};
> >
> >   ...but with a better syntax. It also should work with type assertions
> >   and type tests.
>
> Provided we do come up with such a syntax, why bother with all the
> complicated implicit rules if we can simply require every nested case to
> have a full tree path written out? Sure it would be slightly more
> verbose but that's a low price to pay compared to the complexity of
> implicit rules (which are complex both from the perspective of a
> language implementer and a language user). And if the verbosity becomes
> too bad you can always just match on a subtree in a nested match (which
> also has its own issues, but the point is, there is a choice).
>
> Imo investing some thought in coming up with such a syntax would be well
> worth it, though it may require significant changes to match.

Yeah, the more that I think about it, the more I think this is the best
option.

> For type assertions/tests this is already possible, you can just chain
> as/is clauses.

It isn't possible with `is`, unless you do something like
`foo is bar && foo as bar is baz`.

> > > The new match tree-based approach has a major difference from the
> > > current implementation: it's no longer possible to match one node from
> > > different branches. This can be useful to match on a specific error and
> > > then have a generic error handler later, such as:
> > >
> > >     match (os::open(...)) {
> > >     case let file: io::file => ...;
> > >     case errors::noentry =>
> > >     	// do something when file doesn't exist...
> > >     	void;
> > >     case let err: fs::error =>
> > >     	return err;
> > >     };
> > >
> > > This will stop working since errors::noentry and fs::error cases share
> > > a subtype, which can't be properly represented in the match tree. An
> > > alternative way of accomplishing this is using a seondary match or
> > > an "is"-check:
> > >
> > >     match (os::open(...)) {
> > >     case let file: io::file => ...;
> > >     case let err: fs::error =>
> > >     	if (err is errors::noentry) {
> > >     		// do something when file doesn't exist...
> > >     	} else {
> > >     		return err;
> > >     	};
> > >     };
> >
> > *Very strong* -1 to disallowing this. It's very useful to be able to
> > handle one error case specially, and other error cases separately,
> > without needing an extra level of indentation. And a default case
> > shouldn't be required for this.
>
> In the currently proposed form allowing this is a bad idea. If I can't
> tell at a glance which cases are "subcases" of one another, then such
> subcases and dependence on their ordering should not exist and the cases
> should be disjoint. With your explicit tree path idea however, this
> would be very nice to have imo, and probably also less of a burden to
> implement.

Yeah, good point. +1 to not having any implicit "subcases" in match, and
requiring an explicit tree path.
Reply to thread Export thread (mbox)