~vdupras/duskos-discuss

7 2

Dusk and WASM

Details
Message ID
<5f682829-4713-47d9-a0ad-8c46dc3df67e@www.fastmail.com>
DKIM signature
missing
Download raw message
Hello all,

This morning a saw a mention of a programming language with my name[1], so I
was compelled to check it out.

If I understand this language feature list properly, a self-hosting language
that compiles to WASM means that you can run a Virgil compiler from within a
browser to run code on the spot. Neato.

When you think about it, Dusk could rather easily achieve this feat for C and
probably in a more compact manner (currently less than 200KB of code for the
whole OS). All you need is a WASM kernel and adding to the C compiler either a
Forth backend or a WASM backend.

I don't care much about WASM and the web and I don't know much about the current
possibilities in that space, but maybe that Dusk opens interesting doors and
that my own goals currently align with a bunch of others.

If that was the case, it could supercharge Dusk's development, so why not throw
this idea out there :)

Regards,
Virgil

[1]: https://news.ycombinator.com/item?id=31954053
Details
Message ID
<7832abc5-5f93-647d-7f53-fcd5de1fe46b@gmx.de>
In-Reply-To
<5f682829-4713-47d9-a0ad-8c46dc3df67e@www.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
Hello Virgil,


Am 02.07.2022 um 14:36 schrieb Virgil Dupras:

> If I understand this language feature list properly, a self-hosting language
> that compiles to WASM means that you can run a Virgil compiler from within a
> browser to run code on the spot. Neato.
>
> When you think about it, Dusk could rather easily achieve this feat for C and
> probably in a more compact manner (currently less than 200KB of code for the
> whole OS). All you need is a WASM kernel and adding to the C compiler either a
> Forth backend or a WASM backend.

You know my enthusiasm of running "exotic" systems in the browser, so I
have already been considering to port Dusk (when its direction is more
clear - I know from CollapseOS how often and easily you perform 180°
turns during development) to WASM, to run it in a browser.

Anythink that makes this easier for me is welcome :)

> I don't care much about WASM and the web and I don't know much about the current
> possibilities in that space, but maybe that Dusk opens interesting doors and
> that my own goals currently align with a bunch of others.

The main challenge for a Forth-based system in WASM (as I see it) is its
architecture, which more closely resembles a Harvard architecture than a
Von-Neumann one, meaning that the executing code can access some linear
memory, but it won't find its own code in there. The only way of dynamic
or self-modifiying code is via wasm modules.

At some point during execution the application decides to load and link
a wasm module. Loading means it has its byte representation somewhere in
its accessible linear memory and makes it a loaded module (referenced by
a handle). Linking means to specify for each imported function of the
module what it should invoke, which can be either an exported function
of an already loaded wasm module, or a function implemented in
JavaScript. After linking, the module's exported functions can be called
like any other module function. Reloading (= different bytecode) or
relinking (= different imports) of modules is also possible, but only if
no code of the module to be reloaded/relinked is on the current call
stack, and I believe (not sure) that reloading a module will also relink
all modules that depend on it, so they may not be on the call stack either.

As I fear that creating its own module for each Forth word does not
scale well enough, there probably needs to be some intelligent logic to
decide which words to group into a module and whether to reload an
existing one or define a new one.

But I'm open to your thoughts about this :)


Regards,


Michael
Details
Message ID
<d3a1f4da-ef93-4458-b155-86997580c650@www.fastmail.com>
In-Reply-To
<7832abc5-5f93-647d-7f53-fcd5de1fe46b@gmx.de> (view parent)
DKIM signature
missing
Download raw message
On Sat, Jul 2, 2022, at 10:49 AM, Michael Schierl wrote:
> Hello Virgil,
>
>
> Am 02.07.2022 um 14:36 schrieb Virgil Dupras:
>
>> If I understand this language feature list properly, a self-hosting language
>> that compiles to WASM means that you can run a Virgil compiler from within a
>> browser to run code on the spot. Neato.
>>
>> When you think about it, Dusk could rather easily achieve this feat for C and
>> probably in a more compact manner (currently less than 200KB of code for the
>> whole OS). All you need is a WASM kernel and adding to the C compiler either a
>> Forth backend or a WASM backend.
>
> You know my enthusiasm of running "exotic" systems in the browser, so I
> have already been considering to port Dusk (when its direction is more
> clear - I know from CollapseOS how often and easily you perform 180°
> turns during development) to WASM, to run it in a browser.

Guilty as charged. Dusk is still at a high risk of significant changes at this
stage, so yes, if one wants to minimize the effort, it might be better to wait
a little.

But I also think that it's good to throw such ideas in the air early. I tend to
take Von Neumann for granted, but now I'm reminded that Harvard needs to stay
in a corner of my mind.

> Anythink that makes this easier for me is welcome :)
>
>> I don't care much about WASM and the web and I don't know much about the current
>> possibilities in that space, but maybe that Dusk opens interesting doors and
>> that my own goals currently align with a bunch of others.
>
> The main challenge for a Forth-based system in WASM (as I see it) is its
> architecture, which more closely resembles a Harvard architecture than a
> Von-Neumann one, meaning that the executing code can access some linear
> memory, but it won't find its own code in there. The only way of dynamic
> or self-modifiying code is via wasm modules.
>
> At some point during execution the application decides to load and link
> a wasm module. Loading means it has its byte representation somewhere in
> its accessible linear memory and makes it a loaded module (referenced by
> a handle). Linking means to specify for each imported function of the
> module what it should invoke, which can be either an exported function
> of an already loaded wasm module, or a function implemented in
> JavaScript. After linking, the module's exported functions can be called
> like any other module function. Reloading (= different bytecode) or
> relinking (= different imports) of modules is also possible, but only if
> no code of the module to be reloaded/relinked is on the current call
> stack, and I believe (not sure) that reloading a module will also relink
> all modules that depend on it, so they may not be on the call stack either.
>
> As I fear that creating its own module for each Forth word does not
> scale well enough, there probably needs to be some intelligent logic to
> decide which words to group into a module and whether to reload an
> existing one or define a new one.
>
> But I'm open to your thoughts about this :)

You know WASM a lot more than me, but from your description, it seems like 
there's no other way around than modules, so "one module per word" seems like
the only possibility, if there's one. Had Dusk been an ITC instead of a STC, it
would probably have been possible to do away with modules, but with STC, there's
no way around, we *have* to be ale to make a native call/jump to a word's
address. The C compiler is built around this assumption.

The code in Dusk is certainly dynamic, but I don't do any fancy self-modifying
code jujitsu, so modules might be feasible.

And I don't think I'd be willing to sacrifice STC for the sake of being able to
run on Harvard machines. When compactness isn't a priority, STC is much more
pleasant and straightforward (and fast).

I see a bunch of Forth implementation in WASM, I'll go check whether they're
ITC, or if they use modules.
Details
Message ID
<51ef61c8-bc66-0767-369c-e7c9686cbe2b@gmx.de>
In-Reply-To
<d3a1f4da-ef93-4458-b155-86997580c650@www.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
Hello Virgil,


Am 02.07.2022 um 21:02 schrieb Virgil Dupras:

> You know WASM a lot more than me,

I wrote some emulators in AssemblyScript, which is a typescripe-inspired
language that maps its concepts 1:1 to wasm. But all of them needed only
one module so far. I was considering adding JIT compilation into my
emulator for Wirth's RISC architecture (which would require me to create
modules on the fly for the JITed code), but so far I haven't done it (I
only did it for my JavaScript emulator, by emitting JavaScript that is
eval:ed).

> but from your description, it seems like
> there's no other way around than modules, so "one module per word" seems like
> the only possibility, if there's one.

I believe it might be possible to reload modules to add multiple words
to one module on demand. Do some markers in the word structure in main
memory so that you know that a word returned by FIND is not yet exported
by your module, and on execution add all unexported words to the current
module and reload it, updating the flags. Calls from within Forth/C
words can be threaded directly by calling the other local or exported
word (when compiling a call to a word not yet in a module, assume it is
in the current module).

The main loop is initially in core module, so all later words go into
module1 module. But it gets tricky if there is a new main loop that
might be on the call stack when executing freshly compiled words.
Detecting this at runtime may impose too large overhead, so probably one
would need a SWITCHMODULE command or similar that the code writer uses
after defining his new main loop words. Or maybe switch to new modules
every 100 words or so. (The problem is that I have no idea whether
reloading a module a lot is more expensive than the overhead of calling
an exported function in another module vs. calling a function in the
same module. But I assume reloading grows linearly if there are more
functions, resulting in quadratic load times if you always add to the
same module and reload it after each word definition.)

But maybe I am missing something and it is not that easy (did not have a
close look at your implementation). Or you would not consider that kind
of logic easy.


Regards,


Michael
Details
Message ID
<709029b5-a28c-90aa-24b4-4fadeeebf2a4@gmx.de>
In-Reply-To
<d3a1f4da-ef93-4458-b155-86997580c650@www.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
Hello Virgil,


Am 02.07.2022 um 21:02 schrieb Virgil Dupras:
> And I don't think I'd be willing to sacrifice STC for the sake of being able to
> run on Harvard machines. When compactness isn't a priority, STC is much more
> pleasant and straightforward (and fast).
>
> I see a bunch of Forth implementation in WASM, I'll go check whether they're
> ITC, or if they use modules.

Sorry for digging out this old post, but now I had a look how BEGIN ..
AGAIN is actually implemented in Dusk OS (and probably any other STC
Forth - I don't have experience in that):

xcode (br)
   ax pop,
   ax 0 d) jmp,

: begin HERE @ ; immediate
: again compile (br) , ; immediate


In words, BEGIN pushes the current address (where next instruction will
be assembled) to (compile time) parameter stack, and AGAIN first
compiles a call to "(br)" word, then writes the address from PS next to it.

The "(br)" word itself is popping the return address (which is a pointer
to the address written when AGAIN was compiled), dereferences it and
jumps to it. (It could probably also be seen as popping the return
address, dereferencing it, pushing it again, and returning to it).

Besides from the fact that the executed code is not directly
accessible/readable from within Webassembly (which could be circumvented
by having the code first written in linear memory and then copied to the
WASM module; and the code could just look up the values in the copy in
linear memory), WebAssembly does not allow this kind of return stack
access (i.e. reading the return address) or manipulation (and it also
does not allow unknown opcodes in the middle of a function's code
either), so I assume to get Dusk OS running on Webassembly, one would
have to either

1) Just port (or even compile) the POSIX VM to WebAssembly - resulting
in an ITC Forth. Definitely the most boring way.

or

2) Have some postprocessor read over the Forth "code", find things like
"(br)" or "(next)" and replace them by the proper WebAssembly control
flow instructions. [Not sure yet how many such instructions do exist,
though.] This will fail horribly if any branches point outside the
current FORTH word.

or

3) Have to rewrite some (significant?) portions of the Forth kernel to
write different instructions upon compilation. This will break
application code that relies on the internals of these parts.


Did I miss anything?


Michael
Details
Message ID
<1cc4a085-2f19-42f2-9804-4c37c381ae51@app.fastmail.com>
In-Reply-To
<709029b5-a28c-90aa-24b4-4fadeeebf2a4@gmx.de> (view parent)
DKIM signature
missing
Download raw message
On Fri, Nov 25, 2022, at 4:34 PM, Michael Schierl wrote:
> Hello Virgil,
>
>
> Am 02.07.2022 um 21:02 schrieb Virgil Dupras:
>> And I don't think I'd be willing to sacrifice STC for the sake of being able to
>> run on Harvard machines. When compactness isn't a priority, STC is much more
>> pleasant and straightforward (and fast).
>>
>> I see a bunch of Forth implementation in WASM, I'll go check whether they're
>> ITC, or if they use modules.
>
> Sorry for digging out this old post, but now I had a look how BEGIN ..
> AGAIN is actually implemented in Dusk OS (and probably any other STC
> Forth - I don't have experience in that):
>
> xcode (br)
>    ax pop,
>    ax 0 d) jmp,
>
> : begin HERE @ ; immediate
> : again compile (br) , ; immediate
>
>
> In words, BEGIN pushes the current address (where next instruction will
> be assembled) to (compile time) parameter stack, and AGAIN first
> compiles a call to "(br)" word, then writes the address from PS next to it.
>
> The "(br)" word itself is popping the return address (which is a pointer
> to the address written when AGAIN was compiled), dereferences it and
> jumps to it. (It could probably also be seen as popping the return
> address, dereferencing it, pushing it again, and returning to it).

You understood correctly, except for "pushing it again". The last line of (br)
jumps to [eax], not from stack's top.

> Besides from the fact that the executed code is not directly
> accessible/readable from within Webassembly (which could be circumvented
> by having the code first written in linear memory and then copied to the
> WASM module; and the code could just look up the values in the copy in
> linear memory), WebAssembly does not allow this kind of return stack
> access (i.e. reading the return address) or manipulation (and it also
> does not allow unknown opcodes in the middle of a function's code
> either), so I assume to get Dusk OS running on Webassembly, one would
> have to either

Those are big bummers! I haven't looked closely at other WASM Forth
implementations, but this definitely rules out STC, and probably also DTC
(which needs to play with the return stack to know where to return to).

> 1) Just port (or even compile) the POSIX VM to WebAssembly - resulting
> in an ITC Forth. Definitely the most boring way.

I hate to quote Darth Vader here, but it's too late for me now... One of Dusk's
great strengths is the ability to seamlessly blend Forth with C (and other
languages) *both ways* and unless I've missed something, it's impossible with
ITC and even DTC (I've tried a few things with Collapse OS and didn't find a
path).

This feature is too good to pass, and much of Dusk is already built around this
feature anyways (for example, DuskCC's stdlib depends on the ability of
compiled C code to call on Forth words, such as "fread()" being a proxy for "IO
:read"). Dusk has to stay a STC.

> or
>
> 2) Have some postprocessor read over the Forth "code", find things like
> "(br)" or "(next)" and replace them by the proper WebAssembly control
> flow instructions. [Not sure yet how many such instructions do exist,
> though.] This will fail horribly if any branches point outside the
> current FORTH word.

This would probably be very complex and would need to be able to decode native
bytecode from all supported arches. It would not only need to fix (br) and
(next), but also the code generated by "create" and "doer" and probably others.
The generated WASM code would be likely to have a different byte size, which
would break all the code around it.

No, this doesn't look like a fun path :)

> or
>
> 3) Have to rewrite some (significant?) portions of the Forth kernel to
> write different instructions upon compilation. This will break
> application code that relies on the internals of these parts.

I don't mind rewriting internals in incompatible ways. At this point, we can
still afford it. However, at the light of the information above, I don't see
what we could rewrite them to. The ability to inspect and modify the native
return stack is required.

> Did I miss anything?

Emulation? If one really wanted to run Dusk on WASM and didn't mind the
performance penalty, they could implement a VM that runs similarly to the POSIX
VM, like you did with your JS port of Collapse OS. This C code doesn't do
anything fancy like executing generated code natively on the host machine, it
works like Collapse OS' CVM, an array of little ops that play in their little
sandbox.

This makes the whole "Dusk on WASM" idea less exciting though...

Regards,
Virgil
Details
Message ID
<f7c48086-9012-cb41-fa56-76e9ff5b4dba@gmx.de>
In-Reply-To
<1cc4a085-2f19-42f2-9804-4c37c381ae51@app.fastmail.com> (view parent)
DKIM signature
missing
Download raw message
Hello,


Am 26.11.2022 um 15:24 schrieb Virgil Dupras:
> On Fri, Nov 25, 2022, at 4:34 PM, Michael Schierl wrote:
>> The "(br)" word itself is popping the return address (which is a pointer
>> to the address written when AGAIN was compiled), dereferences it and
>> jumps to it. (It could probably also be seen as popping the return
>> address, dereferencing it, pushing it again, and returning to it).
>
> You understood correctly, except for "pushing it again". The last line of (br)
> jumps to [eax], not from stack's top.

Probably instead of "It could also be seen" I should have written "It
could also be implemented". I am aware that the current implementation
does not do it, but even with that little leeway I see no way to do it
natively in WASM.

>> 1) Just port (or even compile) the POSIX VM to WebAssembly - resulting
>> in an ITC Forth. Definitely the most boring way.

Which seems the same idea that you call "emulation" futher down.
Probably I should not have written "ITC Forth", in case that confused you.

>> 2) Have some postprocessor read over the Forth "code", find things like
>> "(br)" or "(next)" and replace them by the proper WebAssembly control
>> flow instructions. [Not sure yet how many such instructions do exist,
>> though.] This will fail horribly if any branches point outside the
>> current FORTH word.
>
> This would probably be very complex and would need to be able to decode native
> bytecode from all supported arches.

I would try to fit this in only for arches that need it (i.e. WASM for
now). Maybe even do it in the glue code outside of the compiled WASM.
Whenever a word is finished, it will have to call some function anyway
to "flush code caches" on some architectures and do something more on
WASM, and that point would be the point I'd have to intervene.

But I agree that it is both very complex and very fragile (in the sense
that it needs to change if new code gets added that acts differently).

> The generated WASM code would be likely to have a different byte size, which
> would break all the code around it.

The WASM code will end inside of the WASM module which is not exposed in
linear memory anyway. So from the POV of the running Forth code, it
would still see its old code.

> No, this doesn't look like a fun path :)

Agreed.

>> 3) Have to rewrite some (significant?) portions of the Forth kernel to
>> write different instructions upon compilation. This will break
>> application code that relies on the internals of these parts.
>
> I don't mind rewriting internals in incompatible ways. At this point, we can
> still afford it. However, at the light of the information above, I don't see
> what we could rewrite them to. The ability to inspect and modify the native
> return stack is required.

I don't think either that you can rewrite them in a platform independent
way that will still have them work with other arches. And I don't know
how hard it would be to have some "conditionals" inside the boot code to
define words differently for different architectures.

> This makes the whole "Dusk on WASM" idea less exciting though...

Agreed. That's why I called my option #1 "boring".


Regards,


Michael
Details
Message ID
<60b13512-0eaa-4b82-bf29-33107c7f5f35@app.fastmail.com>
In-Reply-To
<f7c48086-9012-cb41-fa56-76e9ff5b4dba@gmx.de> (view parent)
DKIM signature
missing
Download raw message
On Sat, Nov 26, 2022, at 9:41 AM, Michael Schierl wrote:
> Probably instead of "It could also be seen" I should have written "It
> could also be implemented". I am aware that the current implementation
> does not do it, but even with that little leeway I see no way to do it
> natively in WASM.
> [...]
> Which seems the same idea that you call "emulation" futher down.
> Probably I should not have written "ITC Forth", in case that confused you.

Sorry, I did misread you. You didn't miss anything and I agree with you.
Reply to thread Export thread (mbox)