~sircmpwn/hare-dev

5 3

The next hare compiler test suite

Details
Message ID
<94a1a04a1904d88cb277e62991a31dc73965909c@rosiesworkshop.net>
DKIM signature
pass
Download raw message
Hi all,

I can't seem to find it now, but I believe Drew has previously mentioned
intentions to re-work the harec test suite into something more general.
I am planning on starting a fresh write of the hosted typechecker and I
would like to start by creating a test suite of sorts.

For my initial hacking I had thrown together this little driver [1], but
I think something a _little_ bit more sophisticated (and a lot less
slapped-together) would be wise.

[1]: https://git.sr.ht/~roselandgoose/test_hare_unit

Any thoughts or suggestions?

Cheers,
RKL
Details
Message ID
<D0NQB9EI7MSV.6GS18TM2CKY0@turminal.net>
In-Reply-To
<94a1a04a1904d88cb277e62991a31dc73965909c@rosiesworkshop.net> (view parent)
DKIM signature
pass
Download raw message
> I can't seem to find it now, but I believe Drew has previously mentioned
> intentions to re-work the harec test suite into something more general.
> I am planning on starting a fresh write of the hosted typechecker and I
> would like to start by creating a test suite of sorts.

I don't know what Drew had in mind, but in my opinion the most important thing
you can do to make the hosted tests better the bootstrap ones is writing A LOT
of tests for individual components (i guess these are called "unit tests"?). We
have a spec that is most of the time very clear about the desired behavior, and
we should have tests for individual bits of that behavior. At least in
principle a lot of that is easy to make tests for - testing type compatibility
just means writing a test for each entry in the relevant table from the spec
and so on. In practice, we're not doing this in bootstrap harec because it
would require calling compiler's internal functions directly from the tests,
and that's not really possible with the current approach. In hosted checker
this should be much easier fortunately. Having a lot of unit tests is imo
more important than importing the old tests and slowly making them pass one by
one.

One thing bootstrap harec does somewhat well is making sure invalid code is
actually rejected - we have an established way of doing that and we're also
strict about requiring those tests during code review. There is however no way
to track which error is associated with which test, and that means tests
frequently get outdated and irrelevant. Solving this will require some
thinking.

Third, and the hardest thing to make tests for, is generated code's runtime
behavior. We actually have relatively few of those in harec, but again, testing
this kind of thing more thoroughly would require us to write some tooling first
(and unfortunately in this case rewriting things in hare won't give us this
tooling for free). As long as you're just working on the typechecker and not
hosted codegen, you don't need to concern yourself with this part.

Hope this helps a bit :)
Details
Message ID
<D0NZ1O7IKBE4.22HUBMDM4R0E8@cmpwn.com>
In-Reply-To
<D0NQB9EI7MSV.6GS18TM2CKY0@turminal.net> (view parent)
DKIM signature
pass
Download raw message
I agree with Bor's assessment with respect to the value of unit tests
and their applicability to the bootstrap compiler vs the hosted
compiler. However, there is also room for a different kind of tests.

Essentially, unit tests are intrusive tests; Hare accomodates these
better than C. However, we can also make a non-intrusive test suite,
becuase Hare is a standardized language the behavior of the compiler
should fit within an envelope defined by the specification. Given a set
of test cases that exercise a Hare compiler's conformance to the
standard plus a compiler-specific harness that can feed the test inputs
in and read the expected outputs out, one can build a conformance test
suite which is applicable to any Hare compiler (and moreover would be an
indispensible tool in building a new implementation).

This is what I wanted to build with a new Hare compiler test suite.
Details
Message ID
<59112b7bbc9f618b073d842012984053cf10b7e3@rosiesworkshop.net>
In-Reply-To
<D0NZ1O7IKBE4.22HUBMDM4R0E8@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
Thanks Bor & Drew!

> In hosted checker [calling compiler's internal functions directly from
> the tests] should be much easier fortunately. Having a lot of unit tests is imo
> more important than importing the old tests and slowly making them pass one by
> one.
I think you're right. You've convinced me to build up the (intrusive)
unit tests alongside the typechecker.

> One thing bootstrap harec does somewhat well is making sure invalid code is
> actually rejected - we have an established way of doing that and we're also
> strict about requiring those tests during code review. There is however no way
> to track which error is associated with which test, and that means tests
> frequently get outdated and irrelevant. Solving this will require some
> thinking.
I have some thoughts on how to achieve this... Hopefully today I'll have
time to flesh them out here:
https://git.sr.ht/~roselandgoose/hare2c2/tree/main/item/hare/check/errors.ha

> becuase Hare is a standardized language the behavior of the compiler
> should fit within an envelope defined by the specification. Given a set
> of test cases that exercise a Hare compiler's conformance to the
> standard plus a compiler-specific harness that can feed the test inputs
> in and read the expected outputs out, one can build a conformance test
> suite which is applicable to any Hare compiler (and moreover would be an
> indispensible tool in building a new implementation).
What kinds of expected outputs do you have in mind - would you want to
capture standard out, for example, or might exit codes be sufficent?

I'm currently only picturing using 1. the exit code of each (compiler-
specific) harness, to indicate either successful compilation or
rejection during translation, and 2. the exit code of the compiled
binary, to indicate execution phase aborts and (relying on the design of
the test inputs) computation results and side-effects.

Let me know if I'm already off course.

Cheers,
RKL
Details
Message ID
<4afb7075c9260cb68cb900f9116f973122167c37@rosiesworkshop.net>
In-Reply-To
<59112b7bbc9f618b073d842012984053cf10b7e3@rosiesworkshop.net> (view parent)
DKIM signature
pass
Download raw message
Hi Bor,

>> One thing bootstrap harec does somewhat well is making sure invalid code is
>> actually rejected - we have an established way of doing that and we're also
>> strict about requiring those tests during code review. There is however no way
>> to track which error is associated with which test, and that means tests
>> frequently get outdated and irrelevant. Solving this will require some
>> thinking.

> I have some thoughts on how to achieve this... Hopefully today I'll have
> time to flesh them out here:
> https://git.sr.ht/~roselandgoose/hare2c2/tree/main/item/hare/check/errors.ha
Well I ran out of time last week, but I have now prototyped some
riggings. Let me know what sort of cases the following would be
insufficient for?

https://git.sr.ht/~roselandgoose/hare2c2/tree/main/item/hare/check/

I've got an error type with an identifying enum that is (hopefully) very
easy to add to (with a code-gen'd errcode_tostr function to get the doc
comment for each enum value). When compiled with +test, the error type
also contains the pc of the function that initialized it. Then finally
I have a function which wraps debug::symbol_by* to check if an error was
initialized by a given function.

I figure a test which asserts that the resulting error is (or contains)
an error with a given enum value, that was initialized by a particular
function, should keep rejection tests fairly tightly tied to the errors
they expect. Perhaps overly so? Would appreciate any harec maintainer's
thoughts.

Cheers,
RKL
Details
Message ID
<D1KCO2SGPCZ1.156F1O5HGBZV1@cmpwn.com>
In-Reply-To
<4afb7075c9260cb68cb900f9116f973122167c37@rosiesworkshop.net> (view parent)
DKIM signature
pass
Download raw message
Hm, that is an interesting, albiet hacky, approach. I think it might be
a bit too intrusive/deep. If we rename a function or split its code up
across several functions, other kinds of refactoring, we shouldn't
necessarily see conformance tests start failing.

Still of the opinion that the compiler internals should be mostly opaque
to the test suite.
Reply to thread Export thread (mbox)