~vdupras/duskos-discuss

4 2

Dusk C Compiler

Details
Message ID
<cc78385b-b1bf-5f14-a016-8e7401ecdad0@insa-lyon.fr>
DKIM signature
missing
Download raw message
Hello Virgil,

I've been quite busy lately and I didn't have the chance to dive into
the accelerators concept for the CollapseOS VM "stage.fs" mechanism, but
I'll try to get back to it.

In the meantime, I've been diving deep into Dusk C Compiler, and I have
several questions to ask, if you don't mind:

1. Do you plan to add the possibility to compile code into a "binary
executable" ?

My thought is based on the idea that DuskCC is capable to write code
for a specific-arch if the arch assembler is available, it doesn't
necessarily need to be on a DuskOS running for this specific-arch.
If it is indeed the case, I was thinking that the ability of outputting
the compiled code into a "binary executable" might be useful.

The scenario I had in mind would be:
My "workstation" is a DuskOS system on an i386 system, but I might want
to use some micro-controllers I have laying around.
Implementing DuskOS for those micro-controllers might be a bigger task
than needed if I only want to do a specific task, so I might want to
implement an assembler for the target arch on my workstation, to write
some C code, compile it and flash it on my micro-controller.

2. From my current understanding the DuskCC "pgen.fs" file is using a
stack and two VMOp variables to reproduce something like an AST.

It feels like an interesting way to do the job, but I was wondering if
it wouldn't be possible to do it in a simpler way. But you might have
a specific reason to do it in this way, so my question is, is there a
specific reason for this choice ?

3. I was speaking of accelerators earlier, do you think it would be
possible, by maybe modifying a bit the DuskCC architecture, to add the
possibility to "add" more passes to the compiler, in a way similar as
the accelerators you thought about.
Maybe by playing on the VMOp struct, extending or augmenting it ?

And by the way, congrats for the "dive in" since the text editor
implementation.

Lucas.
Details
Message ID
<d4a970c7-67e1-453b-93c0-396ffc06a401@app.fastmail.com>
In-Reply-To
<cc78385b-b1bf-5f14-a016-8e7401ecdad0@insa-lyon.fr> (view parent)
DKIM signature
missing
Download raw message
On Thu, Feb 9, 2023, at 9:15 AM, Lucas Chaloyard wrote:
> Hello Virgil,
>
> I've been quite busy lately and I didn't have the chance to dive into
> the accelerators concept for the CollapseOS VM "stage.fs" mechanism, but
> I'll try to get back to it.
>
> In the meantime, I've been diving deep into Dusk C Compiler, and I have
> several questions to ask, if you don't mind:
>
> 1. Do you plan to add the possibility to compile code into a "binary
> executable" ?
>
> My thought is based on the idea that DuskCC is capable to write code
> for a specific-arch if the arch assembler is available, it doesn't
> necessarily need to be on a DuskOS running for this specific-arch.
> If it is indeed the case, I was thinking that the ability of outputting
> the compiled code into a "binary executable" might be useful.
>
> The scenario I had in mind would be:
> My "workstation" is a DuskOS system on an i386 system, but I might want
> to use some micro-controllers I have laying around.
> Implementing DuskOS for those micro-controllers might be a bigger task
> than needed if I only want to do a specific task, so I might want to
> implement an assembler for the target arch on my workstation, to write
> some C code, compile it and flash it on my micro-controller.

DuskCC explicitly limits itself to compiling code for the currently running
environment for simplicity purposes. There's a lot of assumptions of that nature
in the code. Removing those assumptions would, I think, significantly complexify
the code.

The use case you mention makes sense, but I believe that most of the time,
assembler is more than adequate for the task of programming a microcontroller,
so it's not worth making DuskCC more complex for it. At least that's my
understanding of the situation.

> 2. From my current understanding the DuskCC "pgen.fs" file is using a
> stack and two VMOp variables to reproduce something like an AST.
>
> It feels like an interesting way to do the job, but I was wondering if
> it wouldn't be possible to do it in a simpler way. But you might have
> a specific reason to do it in this way, so my question is, is there a
> specific reason for this choice ?

This way of organizing the code is the simplest way I found. If you can think of
a simpler one, I'm all ears!

The VMOp API is to decouple pgen from the backend. Again, if you can think of a
simpler way, I'm very interested in your proposition.

> 3. I was speaking of accelerators earlier, do you think it would be
> possible, by maybe modifying a bit the DuskCC architecture, to add the
> possibility to "add" more passes to the compiler, in a way similar as
> the accelerators you thought about.
> Maybe by playing on the VMOp struct, extending or augmenting it ?

DuskCC doesn't maintain an AST. If you dive into the git history, you'll see
that it used to, but since then, I've transformed it into a single pass which
directly transforms input tokens into binary code. Again, for simplicity
purposes. The code as it is now is much simpler than what it was with the AST.

This design, however, precludes any kind of subsequent passes. We have no
structure in memory allowing us to go back in time and regenerate binary code
differently.

Now, if by "in a way similar as the accelerators you thought about", you mean
hijacking the generated code with a jump to an optimized binary, then yeah, of
course it's possible, but I don't see why DuskCC wouldn't generate the fastest
possible binary it can think of on the first pass. Is there something specific
you have in mind?

Regards,
Virgil
Details
Message ID
<87f57bd8-eee4-d6ad-365b-8dc083ee9a1f@insa-lyon.fr>
In-Reply-To
<d4a970c7-67e1-453b-93c0-396ffc06a401@app.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
> This way of organizing the code is the simplest way I found. If you can think of
> a simpler one, I'm all ears!
> 
> The VMOp API is to decouple pgen from the backend. Again, if you can think of a
> simpler way, I'm very interested in your proposition.

I don't have any for now, but I was thinking about trying to think of
one, I wanted to know more about your opinion on this before.

I'll let you know if I have any interesting ideas !

> DuskCC doesn't maintain an AST. If you dive into the git history, you'll see
> that it used to, but since then, I've transformed it into a single pass which
> directly transforms input tokens into binary code. Again, for simplicity
> purposes. The code as it is now is much simpler than what it was with the AST.
> 
> This design, however, precludes any kind of subsequent passes. We have no
> structure in memory allowing us to go back in time and regenerate binary code
> differently.

After re-reading and thinking a few minutes more about it, I realized
that my idea was not possible actually on this level.
But wouldn't it be possible to create a VM plugged into the VMOp
interface, which would not be outputting arch-specific code, but
creating an IR ? I guess it might also be against some assumptions done.

> Now, if by "in a way similar as the accelerators you thought about", you mean
> hijacking the generated code with a jump to an optimized binary, then yeah, of
> course it's possible, but I don't see why DuskCC wouldn't generate the fastest
> possible binary it can think of on the first pass. Is there something specific
> you have in mind?
I wasn't necessarily thinking about speed, but more about
security/safety. For example, some compilers use a specific pass to
perform borrow checking.

I was thinking of it as an accelerator because I thought about it as
a "feature" that wouldn't be loaded by default, since it might be a
complex mechanism, and not as something that would necessarily speedup
the system.

But yes, it might not be adequate for the simplicity design that DuskCC
is aiming for.

Lucas.
Details
Message ID
<9d7a1f57-5d0c-49c7-ab75-b79a2cd094d4@app.fastmail.com>
In-Reply-To
<87f57bd8-eee4-d6ad-365b-8dc083ee9a1f@insa-lyon.fr> (view parent)
DKIM signature
missing
Download raw message
On Thu, Feb 9, 2023, at 11:11 AM, Lucas Chaloyard wrote:
>> This way of organizing the code is the simplest way I found. If you can think of
>> a simpler one, I'm all ears!
>> 
>> The VMOp API is to decouple pgen from the backend. Again, if you can think of a
>> simpler way, I'm very interested in your proposition.
>
> I don't have any for now, but I was thinking about trying to think of
> one, I wanted to know more about your opinion on this before.
>
> I'll let you know if I have any interesting ideas !
>
>> DuskCC doesn't maintain an AST. If you dive into the git history, you'll see
>> that it used to, but since then, I've transformed it into a single pass which
>> directly transforms input tokens into binary code. Again, for simplicity
>> purposes. The code as it is now is much simpler than what it was with the AST.
>> 
>> This design, however, precludes any kind of subsequent passes. We have no
>> structure in memory allowing us to go back in time and regenerate binary code
>> differently.
>
> After re-reading and thinking a few minutes more about it, I realized
> that my idea was not possible actually on this level.
> But wouldn't it be possible to create a VM plugged into the VMOp
> interface, which would not be outputting arch-specific code, but
> creating an IR ? I guess it might also be against some assumptions done.

The problem isn't at the VM level, it's about offsets being fed to it. Wherever
you see "here" in pgen.fs, that's one of those "assumptions", that is, an offset
being fed to the assembler which will result in some kind of jump or reference
to it. "here" is profoundly contextual and I have a hard time imagining making
those references relocatable without making the code much more complex.

There's also the matter of references to previously defined function
declarations. DuskCC works very differently from other compilers: all those
declarations aren't part of an object file or anything, they're direct
references to previously compiled functions (or forth words). If we were to
allow DuskCC to cross compile, we'd need to add a way to discriminate between
"live" functions and "cross-compiled" function. Yet another layer of indirection
to add.

>> Now, if by "in a way similar as the accelerators you thought about", you mean
>> hijacking the generated code with a jump to an optimized binary, then yeah, of
>> course it's possible, but I don't see why DuskCC wouldn't generate the fastest
>> possible binary it can think of on the first pass. Is there something specific
>> you have in mind?
> I wasn't necessarily thinking about speed, but more about
> security/safety. For example, some compilers use a specific pass to
> perform borrow checking.
>
> I was thinking of it as an accelerator because I thought about it as
> a "feature" that wouldn't be loaded by default, since it might be a
> complex mechanism, and not as something that would necessarily speedup
> the system.
>
> But yes, it might not be adequate for the simplicity design that DuskCC
> is aiming for.

Although it's hard to imagine a mechanism such as a borrow checker existing
without an AST, it's not impossible either. One could imagine every reference
being pushed somewhere and then checked for "borrow integrity".

But the thing is, borrow checking is a bad example in the case of Dusk: memory
management is very peculiar in here. Dynamic allocation made in scratchpads and
arenas don't ever need to be freed (which makes code in general so much
simpler!). What kind of borrows would we check for?
Details
Message ID
<f8e71b53-28d8-300c-ff83-4bf4fff3ad6c@insa-lyon.fr>
In-Reply-To
<9d7a1f57-5d0c-49c7-ab75-b79a2cd094d4@app.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
> The problem isn't at the VM level, it's about offsets being fed to it. Wherever
> you see "here" in pgen.fs, that's one of those "assumptions", that is, an offset
> being fed to the assembler which will result in some kind of jump or reference
> to it. "here" is profoundly contextual and I have a hard time imagining making
> those references relocatable without making the code much more complex.
> 
> There's also the matter of references to previously defined function
> declarations. DuskCC works very differently from other compilers: all those
> declarations aren't part of an object file or anything, they're direct
> references to previously compiled functions (or forth words). If we were to
> allow DuskCC to cross compile, we'd need to add a way to discriminate between
> "live" functions and "cross-compiled" function. Yet another layer of indirection
> to add.

Yes, it would add way more complexity that I had initially in mind.

> Although it's hard to imagine a mechanism such as a borrow checker existing
> without an AST, it's not impossible either. One could imagine every reference
> being pushed somewhere and then checked for "borrow integrity".
> 
> But the thing is, borrow checking is a bad example in the case of Dusk: memory
> management is very peculiar in here. Dynamic allocation made in scratchpads and
> arenas don't ever need to be freed (which makes code in general so much
> simpler!). What kind of borrows would we check for?

Borrow checking was more of an example of things that could be added
with an AST or any other type of IR.

Thanks for the answers, I have a better understanding of Dusk C
compiler now.

Lucas.
Reply to thread Export thread (mbox)