Hello all,
People looking at Dusk's commits will have seen that I've been happily porting
Dusk to ARM with the Raspberry Pi model 1 as a first target. Things are going
well and the ARM architecture is interesting.
One obvious roadblock is the way "calls" are made in ARM. "bl" doesn't do like
a traditional "call", it simply saves PC to the "LR" register. To make it
behave like a traditional call, one has to wrap it around a push/pop pair of LR
around the "bl" call.
It does the job, but it's wasteful and doesn't leverage the interesting gains
that such a mechanism brings to the table, that is, to "thread" leaf calls
without having to push/pop to SP.
I've been cogitating on the best way to leverage this mechanism in Dusk in a
cross-platform way, but my first ideas involved too much complexity, so I was
lukewarm on this subject, ready to sacrifice ARM's performance on Dusk to keep
complexity at bay.
Then came this idea: the "leaf" flag for dictionary entries (that is, at the
same level as the "immediate" flag).
When creating a word (in this case, this can only work for "code" words), you
can mark it as "leaf" in the same way as you would for "immediate". By doing so,
you promise that this word calls no other word, that it's a "leaf" word.
In addition to this flag, the Low HAL gains two new words, "linkjump," and
"linkret,". On i386 and POSIX, these are simple aliases to "execute," and "ret,"
but on ARM, they are straight "bl" and "mov rPC, rLR" instructions.
In compword, we check for the "leaf" flag of the dictionary entry. If it's set,
we call "linkjump," and if it's not, "execute,". A word marked as "leaf" must
return with "linkret," instead of "exit,".
This would allow Dusk to fully take advantage of ARM (or similar architectures)
with regards to routine calling with minimal added complexity.
I think it will work out...
Regards,
Virgil
Hello Virgil,
Am 26.05.2023 um 21:01 schrieb Virgil Dupras:
> Hello all,>> People looking at Dusk's commits will have seen that I've been happily porting> Dusk to ARM with the Raspberry Pi model 1 as a first target. Things are going> well and the ARM architecture is interesting.
Having more than one architecture fully supported will definitely be
great for proving Dusk OS's concepts.
> One obvious roadblock is the way "calls" are made in ARM. "bl" doesn't do like> a traditional "call", it simply saves PC to the "LR" register. To make it> behave like a traditional call, one has to wrap it around a push/pop pair of LR> around the "bl" call.
The alternative being to wrap the called function body (i.e pushing on
function entry and popping just before exiting - skipping both in case
of a leaf procedure).
Just saying (I did not look at your code so there might be a point why
you did not consider this option. From a code size point of view this
option is favored as most procedures are called more times than defined.
And sometimes you can incorporate it into building a stack frame and
pushing more registers, if architecture allows).
> In compword, we check for the "leaf" flag of the dictionary entry. If it's set,> we call "linkjump," and if it's not, "execute,". A word marked as "leaf" must> return with "linkret," instead of "exit,".
Wrapping the push/pop to the implementation would remove this
distinction during compilation of the call, it would only be required
during compilation/assembling of the leaf procedure itself.
> I think it will work out...
Good luck anyway :)
Regards,
Michael
On Fri, May 26, 2023, at 4:58 PM, Michael Schierl wrote:
> Hello Virgil,>> Am 26.05.2023 um 21:01 schrieb Virgil Dupras:>> Hello all,>>>> People looking at Dusk's commits will have seen that I've been happily porting>> Dusk to ARM with the Raspberry Pi model 1 as a first target. Things are going>> well and the ARM architecture is interesting.>> Having more than one architecture fully supported will definitely be> great for proving Dusk OS's concepts.>>> One obvious roadblock is the way "calls" are made in ARM. "bl" doesn't do like>> a traditional "call", it simply saves PC to the "LR" register. To make it>> behave like a traditional call, one has to wrap it around a push/pop pair of LR>> around the "bl" call.>> The alternative being to wrap the called function body (i.e pushing on> function entry and popping just before exiting - skipping both in case> of a leaf procedure).>> Just saying (I did not look at your code so there might be a point why> you did not consider this option. From a code size point of view this> option is favored as most procedures are called more times than defined.> And sometimes you can incorporate it into building a stack frame and> pushing more registers, if architecture allows).
It was my first idea. The problem with this is that suddenly, all words need a
prelude and a postlude (at each exit points), a mechanism that needs to be
introduced in a cross-arch way so that the HAL continues to work.
This changes the implicit semantic of "code", which so far meant nothing more
than "create an entry". Now it would be "create an entry and add a prelude".
This means that "here" doesn't point to the beginning of the code after a "code"
call. This breaks a lot of usages for it.
... Or make "prelude," explicit, but it has to be added to a lot of words.
That's a lot of noise.
Also, what happens when we jump to that word instead of calling it? Currently,
it changes nothing as it simply defers the "ret". I use this pattern quite
often. With a prelude, this pattern would break because jumping to the word
would push a spurious LR to SP.
I tried to think of the simplest possible way to bring this in, but in the end
I think the "leaf" idea wins.
There's also the fact that the postlude is a bit complicated. It can't be the
exact mirror of the "push LR; bl; pop LR" part, because "mov LR to PC; pop LR"
doesn't work: PC changes so "pop LR" is never executed. This return needs to
involve another register move LR into, pop LR, then set PC to that register. Or
is there something more straightforward I haven't seen (I have yet to fully
"think in ARM"...)
>> In compword, we check for the "leaf" flag of the dictionary entry. If it's set,>> we call "linkjump," and if it's not, "execute,". A word marked as "leaf" must>> return with "linkret," instead of "exit,".>> Wrapping the push/pop to the implementation would remove this> distinction during compilation of the call, it would only be required> during compilation/assembling of the leaf procedure itself.
Yes, but as I wrote above, the prelude/postlude system as I try to imagine it
seems more complicated than it should.
But yeah, in this case, it wouldn't be a "leaf" flag in the dictionary entry,
but only the writer of the "code" word (a ":" word can't be a leaf unless it's
empty) which would knowingly omit prelude/postlude.
>> I think it will work out...>> Good luck anyway :)>>> Regards,>>> Michael
Hello Virgil,
Am 26.05.2023 um 23:19 schrieb Virgil Dupras:
> On Fri, May 26, 2023, at 4:58 PM, Michael Schierl wrote:> This changes the implicit semantic of "code", which so far meant nothing more> than "create an entry". Now it would be "create an entry and add a prelude".> This means that "here" doesn't point to the beginning of the code after a "code"> call. This breaks a lot of usages for it.
Fair point.
> Also, what happens when we jump to that word instead of calling it? Currently,> it changes nothing as it simply defers the "ret". I use this pattern quite> often. With a prelude, this pattern would break because jumping to the word> would push a spurious LR to SP.
You would have to jump to the point after the prelude. Which would again
require you to know that the word is not a leaf word and thus has a
nonempty prelude.
> There's also the fact that the postlude is a bit complicated. It can't be the> exact mirror of the "push LR; bl; pop LR" part, because "mov LR to PC; pop LR"> doesn't work: PC changes so "pop LR" is never executed. This return needs to> involve another register move LR into, pop LR, then set PC to that register. Or> is there something more straightforward I haven't seen (I have yet to fully> "think in ARM"...)
When following the "prelude" approach, at the point of exit, the value
of LR is unspecified (it may still be the original exit value, but it
might also be overwritten by a call inside the word). The correct return
address is on top of the (return) stack.
Therefore, the correct exit sequence would be "pop LR; mov LR to PC". No
extra registers required.
Speaking of return stack: I learned before (the hard way) that Dusk OS
is sometimes manipulating the return address on the return stack to
affect control flow (e.g. jump over parameters). With the "leaf marker"
approach, those need to be updated to instead affect the value of BL.
With the "prelude" approach they can stay unchanged and modify the
return stack. Maybe again I missed some thing :D.
Anyway, I don't want to discourage you from your approach, yet since you
decided to write to the mailing list, I wanted to provide some input for
consideration about points you might have missed.
Regards,
Michael
On Fri, May 26, 2023, at 5:42 PM, Michael Schierl wrote:
> Hello Virgil,>> Am 26.05.2023 um 23:19 schrieb Virgil Dupras:>> On Fri, May 26, 2023, at 4:58 PM, Michael Schierl wrote:>>> This changes the implicit semantic of "code", which so far meant nothing more>> than "create an entry". Now it would be "create an entry and add a prelude".>> This means that "here" doesn't point to the beginning of the code after a "code">> call. This breaks a lot of usages for it.>> Fair point.>>> Also, what happens when we jump to that word instead of calling it? Currently,>> it changes nothing as it simply defers the "ret". I use this pattern quite>> often. With a prelude, this pattern would break because jumping to the word>> would push a spurious LR to SP.>> You would have to jump to the point after the prelude. Which would again> require you to know that the word is not a leaf word and thus has a> nonempty prelude.>>> There's also the fact that the postlude is a bit complicated. It can't be the>> exact mirror of the "push LR; bl; pop LR" part, because "mov LR to PC; pop LR">> doesn't work: PC changes so "pop LR" is never executed. This return needs to>> involve another register move LR into, pop LR, then set PC to that register. Or>> is there something more straightforward I haven't seen (I have yet to fully>> "think in ARM"...)>> When following the "prelude" approach, at the point of exit, the value> of LR is unspecified (it may still be the original exit value, but it> might also be overwritten by a call inside the word). The correct return> address is on top of the (return) stack.>> Therefore, the correct exit sequence would be "pop LR; mov LR to PC". No> extra registers required.
Ah yes, I wasn't thinking straight. This makes a lot more sense :)
> Speaking of return stack: I learned before (the hard way) that Dusk OS> is sometimes manipulating the return address on the return stack to> affect control flow (e.g. jump over parameters). With the "leaf marker"> approach, those need to be updated to instead affect the value of BL.> With the "prelude" approach they can stay unchanged and modify the> return stack. Maybe again I missed some thing :D.
I think the iterators is the only place where I do this kind of wizardry, but
you are absolutely correct, my "leaf markers" idea would introduce a really
uneasy murkiness with regards to what's on RS. I'm not sure it would break
something, but it's murky. That a pretty strong point in disfavor of this idea.
>> Anyway, I don't want to discourage you from your approach, yet since you> decided to write to the mailing list, I wanted to provide some input for> consideration about points you might have missed.
Not at all! Your input is greatly appreciated and helps me a lot. I think I'll
revisit the "explicit prelude/postlude" option, with them being implicit in ":"
definitions. "code" words that call on other words are the exception rather than
the rule, so it might not be as noisy as I first imagine.
Thanks,
Virgil
I ended up going with the explicit push and pop operations:
https://git.sr.ht/~vdupras/duskos/commit/ab86276
I named those words pushret, and popret, instead of prelude, and postlude,
because sometimes (for example in the "runword" implementation in
xcomp/arm/rpi/kernel.fs) it makes more sense to call those words at places other
than the beginning and end of the routine, and in those cases, the wording felt
weird.