~sircmpwn/hare-dev

4 3

working on hare::unit

Details
Message ID
<CXUIESOZDKGH.3UPGKFIYQ93OX@rosiesworkshop.net>
DKIM signature
missing
Download raw message
TLDR: I'm leisurely working on the self-hosted type checker, which is
shaping up to be somewhere between a fleshing-out of hare::unit and a
complete re-write. No proposals/RFCs at this point - just having fun
experimenting, though I'm happy to receive feedback from more seasoned
eyes.

Hi all,

I've been having a lot of fun for the past few months working on
hare::unit mostly for my own learning, so thanks for such a wonderfully
hackable language :) I'm writing this email after some conversation on
IRC to document parts of that conversation, overview the progress that
I've made, and announce my intentions for futher work. If anyone sees
any problems with the approaches I mention in this email, please feel
free to say so. However the primary goal of this email isn't to solicit
feedback per se, but to make a record of where the conversation has gone
and what design ideas I intend to try out.

The last recorded discussion of hare::unit:: and hare::types:: was on
Seb's ticket [1].

I started by fixing a few bugs in hare::types and then implemented some
parts of the type promotion and result reduction algorithms. I set up
some basic riggings to use harec's testing rt:: module as test inputs to
guide which features to work on. As I wrote more unit::process_*() and
unit::scan_*() functions I encountered some memory corruption bugs with
both types::store and unit::scope, which I addressed by pre-allocating
the two types' storage slices. We will see if this approach scales to
checking units with lots of types and very large functions.

The first change of mine which I expect to need complete reworking is
the introduction of types::flexible and types::wrapped. As Seb said in
the ticket:
> What's the deal with flexible type hashes? The behavior is kinda
> weird, both in harec and in current hare::types, and we should decide
> exactly how we want these to work.

As far as I can tell, types::hash() does not actually distinguish
between varients of flexible literals (as they're now called), contrary
to spec section 6.5.11 (circa 2023-Dec-08). So in order to implement
flexible literal handling without taking types:: and harec out of sync,
I made two additional types:
```
flexible = struct {
	*_type,
	union {
		iconst,
		rconst,
	}, // fconst does not need any extra data tied to its _type
}
```
and `wrapped = (*_type | flexible)`, which surely is not what we want
long term. It let me keep working without taking types::hash() and harec
out of sync but broke the hare type <==> *types::_type mapping and (even
worse imo) cluttered the code base with lots of s/ty/unwrap(ty)/ calls.
I have yet to read harec's type hash implementation, and I assume
reading over that will be neccessary before I can productively seek
input on how we /actually/ want to do represent the types of flexible
literals. Look forward to a future email on that topic!

The curious can find my work on the hare_unit branch of my tree [2].

On IRC, Seb gave some immediate feedback on the code I've written:
> unit::decl shouldn't store slices; each slice of declarations in the
> ast should be split up into separate declarations in the checked unit
> (necessary for declaration resolution and stuff).

and cleared up some confusion I had about scope_class::SUBUNIT which,
Seb pointed out, is needed for subunit-scoping imports. This lead me to
the realization that with the current hare::context, I'll often need to
jump over the subunit scope to insert declarations into the (init) scope
above it.

I'll fix these issues next, before moving on to some more exciting bits.

I asked Seb what current architectural issues/refactorings might make
sense to prioritize to put types:: and unit:: on better footings. Seb's
general advice was:
> i think a lot of these design questions will really only be answered
> by writing code and seeing what works
but also:
> if you want to try to draft some new designs, then i'd say good places
> to start are experimenting with merging hare::types and hare::unit,
> better error handling, and figuring out a suitable interface - which
> handles all existing uses of hare::types but also doesn't expose too
> much.

So first I'll document some ideas from Seb on better error handling:
> types::deferred should be removed, in favor of just reusing the same
> declaration resolution logic for resolving types (this is necessary
> for handling circular dependencies).
and
> hare::unit/hare::check/whatever shouldn't use regular error
> propagation, since we want to continue checking even if an error is
> encountered, so multiple errors can be collected.

with implementation details:
> I think a good way to go about it is to store an errors slice within
> the context, where each error is a string and a location, and then
> some internal function can be used for reporting errors (which appends
> to the slice and possibly maybe does other stuff too) for error
> reporting; i do something similar in hare-c:
> https://git.sr.ht/~sebsite/hare-c/tree/main/item/c/check/util.ha#L23
> (the context stores an array rather than a slice, so allocation isn't
> required for errors, but either works)

I've started preparing to remove types::deferred by grouping all of the
current error types togethor and elliminating unit::error. Next I'm
going to experiment with using the scan phase of unit::check() to
resolve type/decl dependencies and generate a feasible plan/job queue
for the validate phase to execute. I suspect that work will involve
tweaking unit::object, so may also mark a good chance to implement
another of Seb's ideas:
> unit::object_kind should be removed since unit::object can just use a
> tagged union instead

If that experiment pans out, I'll probably focus on the overall division
of types and responsibilities across the types:: and unit:: modules. Seb
had said:
> there isn't, like, a definitive consensus on whether ::unit and
> ::types should be separate, but i feel pretty confident that things
> would be simpler if they were combined, for things like resolving type
> aliases and checking expressions within types (like array length). at
> the very least the actual type checking stuff should probably go
> alongside stuff in ::unit (at which point renaming it to ::check makes
> sense imo)
and
> if you want it'd be nice if you tried combining the two modules to see
> how that works out (and possibly experimenting with different designs,
> like having separate modules for type/expr types vs putting everything
> into one module)

I have already taken a brief stab at combining the two modules with some
mixed results - see the intotypes branch. I plan on setting that branch
aside and focusing on the points I've highlighted above, but here are
some tenatative thoughts on the merger of the types:: and unit::
modules:

One minor and accidental bonus is that the checked expr types now look
more like their ast:: counterparts, with _expr suffixes necessary to
disambiguate them from type types. I think grouping the type checking
and typestore logic togethor is an improvement, if only for the types
and functions that no longer need to be exported. However the size of
types:: is now much more than any other hare::*:: module.

Lastly I'll document one final point from Seb which expands on a small
note from the ticket:
> i have no idea what the best approach is for handling imports lol, and
> tbh that can probably be safely put off until other stuff is in place
> and we possibly have a better idea on the design of things

I have some of my own ideas as to what handling imports may look like,
having already played arround with that part a little bit on hare_unit,
but I agree it is not something worth prioritizing at this point.

Looking forward to working with y'all, and thanks again for creating
such a fun and sensible language!

Cheers,
Rosie Keith Languet

[1]: https://todo.sr.ht/~sircmpwn/hare/913
[2]: https://git.sr.ht/~roselandgoose/hare
Details
Message ID
<CXUKQ1K3DU36.3TMXER4D0QEUV@localhost>
In-Reply-To
<CXUIESOZDKGH.3UPGKFIYQ93OX@rosiesworkshop.net> (view parent)
DKIM signature
missing
Download raw message
On Thu Dec 21, 2023 at 9:33 PM EST, Rosie Keith Languet wrote:
> So first I'll document some ideas from Seb on better error handling:
> > types::deferred should be removed, in favor of just reusing the same
> > declaration resolution logic for resolving types (this is necessary
> > for handling circular dependencies).
> and
> > hare::unit/hare::check/whatever shouldn't use regular error
> > propagation, since we want to continue checking even if an error is
> > encountered, so multiple errors can be collected.
>
> with implementation details:
> > I think a good way to go about it is to store an errors slice within
> > the context, where each error is a string and a location, and then
> > some internal function can be used for reporting errors (which appends
> > to the slice and possibly maybe does other stuff too) for error
> > reporting; i do something similar in hare-c:
> > https://git.sr.ht/~sebsite/hare-c/tree/main/item/c/check/util.ha#L23
> > (the context stores an array rather than a slice, so allocation isn't
> > required for errors, but either works)

I thought I'd share something tangentially related about a program I'm
working on, because I'm needing to make a system for the reporting of
more complex error messages and I'm trying to figure out how to
structure it.

I'm working on a sort of web-server/framework thing, and the bulk of it
is being implemented as a library such that a small program can be
wrapped around it to form something that can be executed. Part of that
library's job is to scan a source directory and compile a tree of
relevant files/info about the content being served. That includes
configuration files, which are allowed to refer to resources defined in
eachother, making resolving everything within them a bit difficult. If
there's an error somewhere, which could be anything from a filesystem
access error to a config-file-parsing error to an error in the
relationship between the different configuration files, a decent slab of
information needs to be compiled for the user if the error messages is
going to be at all useful to them. The most complex of those errors
would probably be telling a user about an unresolvable dependency cycle,
along with all of the resources that depend on each other in that cycle
and what configuration file they were defined in.

Beacuse this is a library, I'm not going to print the errors when I
encounter them, I'm going to make sure they're properly propagated to
the caller so that they can deal with them. And because of how complex
the information is, I can't reasonably have everything in a single
structure, there's going to be lists of things and things linked to
other things.

So anyways, I want to be able to handle that in the most simple possible
way, while prioritizing both how easy it is to write the code
propagating the errors and also the code recieving them. What I'm so far
going towards is first, giving the code that propagates errors access to
a context variable, that it can update with context of any error
messages that are propagated afterwards. For example, if I have a
function that processes a single file on my computer, doing things like
checking if it's a directory, reading it if it's a configuration file,
and processing the contents of it, etc., I can have that function simply
update the error context at the beginning with the path of the file
being processed, and then any errors encountered can be easily and
non-verbosely propagated.

When those errors are propagated (which at this point don't include all
the context they need to be useful), they can at some point be wrapped
by another error type that includes the error context defined earlier,
and that's what the library user would get. Additionally, I currently
have it set up so that any error information that cannot be statically
allocated will be stored in a global variable and freed when the program
ends. If a function that generates one of the errors is called when a
previous error already filled the global variable, then the old error is
overwritten. That behavior is consistent with the lifetimes of data
returned by a lot of funtions in the standard library, like the strerror
functions, and removes the burden for the user of having to deinitialize
error info when they're done with it (or if they don't want to touch it
at all).
Details
Message ID
<CXUNLCA1TL2N.2V67ZPG0VHWIG@rosiesworkshop.net>
In-Reply-To
<CXUIESOZDKGH.3UPGKFIYQ93OX@rosiesworkshop.net> (view parent)
DKIM signature
missing
Download raw message
Oops,

Please disregard this bit:
> Next I'm going to experiment with using the scan phase of
> unit::check() to resolve type/decl dependencies and generate a
> feasible plan/job queue for the validate phase to execute.

Rereading harec/docs/declaration_solver.txt I can already see that my
description above is wrong. I think the rest of that part will remain
correct tho:
> I suspect that work will involve tweaking unit::object, so may also
> mark a good chance to implement another of Seb's ideas:
> > unit::object_kind should be removed since unit::object can just use a
> > tagged union instead

I'll work now on implementing the declaration solver as described by
that design doc.

-RKL
Details
Message ID
<CY2I8WLQ8E3T.3PKKKH0DY7NZA@d2evs.net>
In-Reply-To
<CXUIESOZDKGH.3UPGKFIYQ93OX@rosiesworkshop.net> (view parent)
DKIM signature
missing
Download raw message
nice work :)

On Fri Dec 22, 2023 at 2:33 AM UTC, Rosie Keith Languet wrote:
> The first change of mine which I expect to need complete reworking is
> the introduction of types::flexible and types::wrapped. As Seb said in
> the ticket:
> > What's the deal with flexible type hashes? The behavior is kinda
> > weird, both in harec and in current hare::types, and we should decide
> > exactly how we want these to work.
>
> As far as I can tell, types::hash() does not actually distinguish
> between varients of flexible literals (as they're now called), contrary
> to spec section 6.5.11 (circa 2023-Dec-08). So in order to implement
> flexible literal handling without taking types:: and harec out of sync,
> I made two additional types:

flexible type hashes are just based on the storage/flags (this is where
the distinguishing between the variants happens) along with a unique id
that's incremented each time a new flexible literal type is created.
this hash can never make it into the abi, so the only requirement is
that each call to type_create_flexible creates a type with a unique
hash - having it be different in harec vs hare::types shouldn't cause
any issues

the main thing to investigate when implementing flexible types is
whether we can design the type system such that lowering a flexible type
after it's been inserted into another type is feasible. this would allow
us to get rid of the restriction in §6.6.11.7, and it should make the
flexible literal system more likely to be able to handle the things type
hints are currently used for
Details
Message ID
<CY2U1YK1ZJ5O.1YPIKB4F28V3P@rosiesworkshop.net>
In-Reply-To
<CY2I8WLQ8E3T.3PKKKH0DY7NZA@d2evs.net> (view parent)
DKIM signature
missing
Download raw message
Thanks Ember!

> flexible type hashes are just based on the storage/flags (this is where
> the distinguishing between the variants happens) along with a unique id
> that's incremented each time a new flexible literal type is created.
Ah I see. It's just that last part that's missing from hare::types::hash

I see the relevant switch case(s) in harec:src/types.c:type_hash() which
is missing, as well as the harec:include/types.h:type_const struct that
it depends on. (I last pulled before the s/const/literal/ spec changes -
these types may have changed names since but ¯\_(ツ)_/¯ )

> this hash can never make it into the abi, so the only requirement is
> that each call to type_create_flexible creates a type with a unique
> hash - having it be different in harec vs hare::types shouldn't cause
> any issues
That makes sense, thanks! :)

These comments in hash.ha had me cautious:
> // Returns the hash of a type. These hashes are deterministic and universally
> // unique: different computers will generate the same hash for the same type.
> export fn hash(t: *_type) u32 = {
> 	// Note that this function should produce the same hashes as harec; see
> 	// bootstrap harec:src/types.c:type_hash
but your point that *this* particular hash result never leaks into the
ABI frees me from that caution. :) Now I'll have to think about how(/if)
I want to represent that unique id in the types::_type struct...

> the main thing to investigate when implementing flexible types is
> whether we can design the type system such that lowering a flexible type
> after it's been inserted into another type is feasible.
I'll keep that in mind when I try to finish the flexible type lowering,
but for now I'll leave the type theory to the expers :P

Cheers, and happy (western) New Year!
-RKL
Reply to thread Export thread (mbox)