In the tutorial, in the section about error handling
https://harelang.org/tutorials/introduction/#a-few-words-about-error-handling
I see that os::create may return an fs::error. The example code only has a case
for errors::noaccess, and then a case for fs::error.
I'm trying to better understand how to think about types of errors. Here's
the doc for fs::error:
~~~
$ haredoc fs::error
// All possible fs error types.
type error = !(errors::noentry | errors::noaccess | errors::exists |
errors::busy | errors::invalid | errors::unsupported |
wrongtype | cannotrename | io::error);
~~~
Regarding the tutorial code, in IRC a couple days ago I asked, "Is that last case a
catch-all for any other "sub-error-types" within fs::error? (That is, will an errors::invalid
be caught in that `case let err: fs::error`?)"
And the answer that came back was something like yes but that it's kinda strange
and maybe redundant. I don't understand. Can anyone tell my why that is?
I mean, I see that an fs::error,
being a tagged union, is an errors::noentry, OR an errors::noaccess OR an
errors::exists, OR etc. So, if a match/case is checking for an fs::error, and it sees an
errors::noentry, then ... that errors::noentry is an fs::error too, correct?
Reminds me of OOP: if class Car inherits from class Vehicle, then an instance of
Car, say, friends_car is indeed a Vehicle. Is that the correct way to think about
tagged unions?
I think there's probably a distinction here I'm not seeing. I notice that, in the source for
fs::strerror, it checks for each kind of its "subtypes", yielding a different str for each one...
-- John
On Sun Aug 20, 2023 at 2:11 AM EDT, John Gabriele wrote:
> In the tutorial, in the section about error handling> https://harelang.org/tutorials/introduction/#a-few-words-about-error-handling> I see that os::create may return an fs::error. The example code only has a case> for errors::noaccess, and then a case for fs::error.>> I'm trying to better understand how to think about types of errors. Here's> the doc for fs::error:>> ~~~> $ haredoc fs::error> // All possible fs error types.> type error = !(errors::noentry | errors::noaccess | errors::exists |> errors::busy | errors::invalid | errors::unsupported |> wrongtype | cannotrename | io::error);> ~~~>> Regarding the tutorial code, in IRC a couple days ago I asked, "Is that last case a> catch-all for any other "sub-error-types" within fs::error? (That is, will an errors::invalid> be caught in that `case let err: fs::error`?)">> And the answer that came back was something like yes but that it's kinda strange> and maybe redundant. I don't understand. Can anyone tell my why that is?
It may help to first use a simpler example:
type foo = !void;
type bar = !void;
type baz = !(foo | bar);
fn f() (void | foo | baz) = {
// code code code
};
export fn main() void = {
match (f()) {
case void => void;
case foo =>
// Reached when foo is returned from f().
void;
case baz =>
// Reached when baz is returned from f(). Note that baz
// is a tagged union which contains foo. But if foo is
// returned, this case *won't* be run, since the return
// type is just plain foo, not a foo within a baz. If
// the function returned `foo: baz`, then this case
// would run, and you'd be able to match on the baz
// value to see if it's foo or bar.
void;
};
};
So thinking about it in an OOP way isn't entirely correct: it's not true
that foo is a baz, but it is true that baz contains foo as one of its
members. foo is just foo, but a function may return a value of type baz,
whose tag is foo.
Also note that the above example works identically even if you swap the
order of the foo and baz cases.
But this poses a problem, and this is why the response on IRC said it's
kinda strange and redundant:
match (f()) {
case bar =>
// This is legal (by design). bar isn't directly returned from
// f(), but it's one of the members of baz so you can use it as
// a case.
void;
case baz =>
// But this is also legal, since baz is directly returned from
// the function. The problem is that bar is a member of baz, so
// technically this case is also matched. This is the problem,
// and it's what we currently haven't solved, since we're still
// trying to nail down specific tagged union semantics.
void;
case => void;
};
As of now, the above code does what you'd expect: if f() returns bar, it
runs the first case, otherwise, if it returns any other value of baz, it
runs the second case (you can think of it as an if-else statement, where
it checks each case sequentially). But this means that if you were to
swap the order of the cases, the baz case would be matched (note that
this behavior is different from the first example).
We intend to add match exhaustivity in the future (every possible type
that could be returned is handled in the match, and is handled exactly
once), but the current semantics are incompatible with this, since it's
possible (and very easy) to match a type twice. This is what the
response on IRC was trying to get at.
So back to the example in the tutorial:
// os::create returns (io::file | fs::error)
const file = match (os::create(path, 0o644, oflags)) {
case let file: io::file =>
yield file;
case errors::noaccess =>
// errors::noaccess is a member of fs::error
fmt::fatalf("Error opening {}: Access denied.", path);
case let err: fs::error =>
// Matches any other fs::error
fmt::fatalf("Error opening {}: {}", path, fs::strerror(err));
};
The specific logic governing the cases is: it checks if the case type is
a member of the tagged type being matched on, and if so, checks if
that's the tag (this is what governs the behavior in the first example).
Otherwise, it recursively expands nested tagged unions, and checks if
that results in a match.
If this all sounds complicated to you, I'd advise you to not worry too
much about it. The specific semantics surrounding match exhaustivity are
still undecided, but generally things currently work the way you'd
intuitively expect them to. The only reason I went into such detail here
is because I figured it might help you understand why you got the
response you did when you asked the question on IRC, and may help you
better understand how tagged unions work in general. Technically, in the
example in the tutorial, errors::noaccess is being matched twice, but so
long as you write the errors::noaccess case first (which is good style
anyways; putting the least specific cases last), the match will behave
as you expect, where the fs::error case is a "catch-all" for all
fs::errors that aren't handled in their own cases.
Let me know if you're still confused or you have any other questions :)
Thanks so much, Sebastian! I'm still digesting and
rereading your response (and relevant parts of the
tutorial), and have a few questions.
In your first example, in the comment:
~~~
case baz =>
// Reached when baz is returned from f(). Note that baz
// is a tagged union which contains foo. But if foo is
// returned, this case *won't* be run, since the return
// type is just plain foo, not a foo within a baz. If
// the function returned `foo: baz`, then this case
// would run, and you'd be able to match on the baz
// value to see if it's foo or bar.
void;
~~~
you write, "If the function returned `foo: baz`", but
since both foo and baz are types, I don't know what
that means (I read that as, "object foo, which is of
type baz"). I understand though, that if f() returns
a baz, then that case would run.
Also, you write, in that baz case, I'd be able to then
match on the baz value to see if it's a foo or a bar;
would I do that using another match within the current
match?
> So thinking about it in an OOP way isn't entirely> correct: it's not true that foo is a baz, but it is> true that baz contains foo as one of its members.> foo is just foo, but a function may return a value> of type baz, whose tag is foo.
What does it mean for a value to be of type baz,
but have a tag of type foo? I think I'm misunderstanding
something basic about tagged unions here.
On Sun Aug 20, 2023 at 2:32 PM EDT, John Gabriele wrote:
> In your first example, in the comment:>> ~~~> case baz =>> // Reached when baz is returned from f(). Note that baz> // is a tagged union which contains foo. But if foo is> // returned, this case *won't* be run, since the return> // type is just plain foo, not a foo within a baz. If> // the function returned `foo: baz`, then this case> // would run, and you'd be able to match on the baz> // value to see if it's foo or bar.> void;> ~~~>> you write, "If the function returned `foo: baz`", but> since both foo and baz are types, I don't know what> that means (I read that as, "object foo, which is of> type baz"). I understand though, that if f() returns> a baz, then that case would run.
`foo: baz` is a cast expression. `void` (and aliases thereof, like
`foo`) can also be used as expressions, denoting "the value whose type
is void (or the alias of void)". This works since void can only store
one value, that being void itself. (The terminology may be confusing,
since void can refer to either a type or an expression depending on the
context.) So `foo: baz` is saying to take the value whose type is foo,
and cast it to baz, so you end up with a tagged union whose tag is that
of the type foo.
> Also, you write, in that baz case, I'd be able to then> match on the baz value to see if it's a foo or a bar;> would I do that using another match within the current> match?
Yes:
match (f()) {
case foo => void;
case let b: baz =>
match (b) {
case foo => void;
case bar => void;
};
};
> > So thinking about it in an OOP way isn't entirely> > correct: it's not true that foo is a baz, but it is> > true that baz contains foo as one of its members.> > foo is just foo, but a function may return a value> > of type baz, whose tag is foo.>> What does it mean for a value to be of type baz,> but have a tag of type foo? I think I'm misunderstanding> something basic about tagged unions here.
Starting again with a more basic concept, that being regular unions. If
you're already familiar with these then feel free to skip over this:
type u = union {
i: int,
s: str,
sl: []u8,
};
All of the fields share the same storage. This is used in C, usually
within structs, when some data needs to be stored, but the kind of data
being stored is known from some other context. You access the fields in
the same way you would for a struct, but unlike a struct, all fields
occupy the same storage in memory.
The most common way to carry context for which field is active is to use
a tag:
type u_tag = enum u32 {
INT,
STR,
SLICE,
};
type u = struct {
tag: u_tag,
data: union {
i: int,
s: str,
sl: []u8,
},
};
So now you can switch on the tag, and decide which field to access based
on the tag's value. This is called a tagged union.
Hare's built-in tagged unions are very similar to this (in fact,
internally they share the same representation), however, they're
type-safe, meaning your work is checked at compile-time when possible
(such as when using an invalid case in a match expression), and
otherwise at runtime:
type u = (int | str | []u8);
export fn main() void = {
let x: u = 12;
let i = u as int; // succeeds: i is an int whose value is 12
let s = u as str; // aborts at runtime: u's tag isn't str
};
The "tag" here is a unique u32 generated by the compiler, called the
type ID. Every type (including alias types) has a unique type ID.
Types which are aliases of void (or void itself) may also be used in
tagged unions: they simply don't store any additional data besides the
tag itself.
So in the example of foo and baz:
type foo = !void;
type bar = !void;
type baz = !(foo | bar);
If a function returns `foo: baz`, then it is returning a value whose
type is baz, where the tag indicates that the data for the tagged union
is to be interpreted as foo (in this case there is no "data", since
there's nothing else to be stored, but the same concept still applies).
So, this is still different from OOP: foo itself isn't a baz, baz simply
contains foo as a member (i.e. the tag of baz may be the type ID of
foo). However, foo may still be used wherever baz is expected, since
it's a member of the tagged union (this is more applicable when using
non-void aliases, such as the (int | str | []u8) example above: int is
assignable to the tagged union, ditto for str and []u8). But when a
function has a return type of (foo | baz), returning a baz whose tag is
foo is different from returning foo itself.
Here's another example which is common in actual Hare code:
// Note that both error1 and error2 have io::error as a member.
type error1 = !(io::error | regex::error);
type error2 = !(io::error | path::error);
// If f() errors, it will return either error1 or error2. Both types
// contain io::error, but each io::error carries different context (i.e.
// it's returned within different types).
fn f() (void | error1 | error2) = //...
export fn main() void = {
match (f()) {
case void => void;
case io::error =>
// This case isn't legal; the compiler will error out
// here. Since io::error is a member of both error1 and
// error2, it's impossible to know which io::error this
// should match.
void;
case let e: error1 =>
match (e) {
case io::error =>
// For the error1 and error2 cases, you can
// match on the error (which itself is a tagged
// union) and see whether the tagged union
// contains an io::error.
void;
case regex::error => void;
};
case let e: error2 =>
match (e) {
case io::error =>
// This is what I meant when I said "each
// io::error carries different context". They're
// returned within different tagged unions, so
// in that sense they carry additional
// information (what context the io::error
// happened).
void;
case path::error => void;
};
};
Note that if f()'s prototype were modified like so:
fn f() (void | error1 | error2 | io::error);
Then using io::error as a case in the outermost match *would* be
allowed, since you'd be matching on the io::error directly returned. But
if either error1 or error2 was returned, even if they stored io::error,
that case wouldn't be matched.
Furthermore, if f()'s prototype looked like this:
fn f() (void | error1);
Then using io::error as a case would still be valid, since it's
unambiguous what io::error refers to (io::error could only ever be
returned as an error1).
This is all to say that, conceptually, tagged unions themselves are
pretty simple: they store a tag, and possibly some data; such data may
be another nested tagged union. Match expressions have additional
semantics which are designed to make it easier to select a specific tag,
even if said tag is from a nested tagged union, but these semantics are
more complicated, and are likely what's confusing you.
On Sun, Aug 20, 2023, at 3:18 PM, Sebastian wrote:
> On Sun Aug 20, 2023 at 2:32 PM EDT, John Gabriele wrote:>>>> What does it mean for a value to be of type baz,>> but have a tag of type foo? I think I'm misunderstanding>> something basic about tagged unions here.>> Starting again with a more basic concept, that being regular unions. If> you're already familiar with these then feel free to skip over this:> {snip}
Sebastian,
Thanks so much for for the review of the basic concepts. It's been
quite a while since I've used C, and had forgotten that the fields
of a union all share the same storage! And thank you for the info
and example of what a tagged union is.
> Hare's built-in tagged unions are very similar to this (in fact,> internally they share the same representation), however, they're> type-safe, meaning your work is checked at compile-time when possible> (such as when using an invalid case in a match expression), and> otherwise at runtime:>> type u = (int | str | []u8);> export fn main() void = {> let x: u = 12;> let i = u as int; // succeeds: i is an int whose value is 12> let s = u as str; // aborts at runtime: u's tag isn't str> };
Ah, now I understand better what `x as int` and `x as str` are for!
(I think that's a typo, with `u` instead of `x` in the last 2 lines.)
> Here's another example which is common in actual Hare code:> {snip}
Thank you again for the detailed example. Much clearer to me now,
and much appreciated! (Time for me to do some re-reading of docs,
and update my own notes.)
-- John