Hello all,
I've just released Dusk v11[1], its main focus being the rewrite of uxn and
varvara. The result is a much, much faster uxn CPU and a more compliant varvara
computer. On i386, it's significantly faster than the official implementation[2]
and unlike the previous implementation, it can run the Left editor just fine. On
ARM, it's slightly slower, but I'm confident I'll be able to beat the official
uxn at bunnymark.
This development implied a complete rewrite of the graphical stack, which is now
much faster and, I think, slicker.
I've also added the Dusk Tour[3] in the docs.
Now, about my focus for the next release... I always want to do the same thing,
consolidate Dusk on the RPi, making it run faster, better, more reliable, but I
always nerd snipe myself into some kind of goal that makes my development
process into a whirlwind.
This release hasn't been developed on Dusk, the challenge being too great. The
next one will. I've even aligned myself a couple of easier goals to help me
stay focused.
* Make emul/uxn's "k" variants of the ops faster. Right now, the implementation
is suboptimal.
* Activate icache and dcache in the RPi ports. varvara drawing on it is really,
really, really slow and I suspect that it's because of the cache, because on
dusk-sdl on a RPi, it's much faster.
* Steal Devine's mono font and use it in io/fbgrid.
* Improve USB error handling. A single USB polling error on the keyboard or
mouse causes the system to go into an infinite loop.
* Probably improve text/ed in the process. I would have gone for Left right
away, but it's really too slow on the RPi without a dcache and icache.
* I now have a replacement mini-ATA drive to replace my super old laptop's
broken one. Bring it to life again and reimplement VESA 1.2 support, which I
had to remove because the graphics subsystem changed too much and I couldn't
test the code. I remember that Left was very slow on that machine, but I
suspect that it's now pretty fast.
Onwards,
Virgil
[1]: https://git.sr.ht/~vdupras/duskos/tree/master/item/CHANGELOG.md#v11---20240924
[2]: https://lists.sr.ht/~vdupras/duskos-discuss/%3CZtW1cBE+M1bSGmc6@arendt%3E
[3]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/tour.txt
On Tue, Sep 24, 2024 at 04:52:53PM -0400, Virgil Dupras wrote:
> * Activate icache and dcache in the RPi ports. varvara drawing on it is really,> really, really slow and I suspect that it's because of the cache, because on> dusk-sdl on a RPi, it's much faster.
I've just pushed a new drv/arm/cache unit allowing cache control, which first
allowed me to see that the write buffer is already enabled by default on the
RPi1. I had the secret hope that it was disabled and would give me an
essentially free and huge speedup on pushing pixels on the framebuffer.
As we say around here, "Y'en aura pas d'facile".
Instruction and data caches aren't enabled by default, however, and enabling
them isn't a big deal since I've already ironed the invalidation kinks for the
purpose of running Usermode Dusk under ARM. It's only a matter of hooking
"clearicache" before enabling the instruction cache.
But this doesn't make varvara drawing noticeably faster. Left is still janky
with mouse movement.
Right now, gr/varvara always redraws its whole screen on any change. On a fast
computer, it doesn't matter, but even on Usermode Dusk on a RPi, it's bearable.
My guess, now, is that SDL has an elaborate damage detection mechanism to
minimize screen updates. I don't see how else there could be such a big
discrepancy between varvara under Usermode and Bare Metal on the same RPi.
So I'll need to be more surgical in varvara screen updates.
The SDL version of varvara does screenbuffer changes in the background,
but for the X11 implementation(which is why I use daily, uxnemu takes
too much ram for my likes) I do it by hand:
https://git.sr.ht/~rabbits/uxn11/tree/main/item/src/uxn11.c#L251
Maybe this can help you?
On 2024-10-04 08:14, Virgil Dupras wrote:
> On Tue, Sep 24, 2024 at 04:52:53PM -0400, Virgil Dupras wrote:>> * Activate icache and dcache in the RPi ports. varvara drawing on it is really,>> really, really slow and I suspect that it's because of the cache, because on>> dusk-sdl on a RPi, it's much faster.> > I've just pushed a new drv/arm/cache unit allowing cache control, which first> allowed me to see that the write buffer is already enabled by default on the> RPi1. I had the secret hope that it was disabled and would give me an> essentially free and huge speedup on pushing pixels on the framebuffer.> > As we say around here, "Y'en aura pas d'facile".> > Instruction and data caches aren't enabled by default, however, and enabling> them isn't a big deal since I've already ironed the invalidation kinks for the> purpose of running Usermode Dusk under ARM. It's only a matter of hooking> "clearicache" before enabling the instruction cache.> > But this doesn't make varvara drawing noticeably faster. Left is still janky> with mouse movement.> > Right now, gr/varvara always redraws its whole screen on any change. On a fast> computer, it doesn't matter, but even on Usermode Dusk on a RPi, it's bearable.> My guess, now, is that SDL has an elaborate damage detection mechanism to> minimize screen updates. I don't see how else there could be such a big> discrepancy between varvara under Usermode and Bare Metal on the same RPi.> > So I'll need to be more surgical in varvara screen updates.
On Fri, Oct 04, 2024 at 09:14:23AM -0700, Hundred Rabbits wrote:
> The SDL version of varvara does screenbuffer changes in the background, but> for the X11 implementation(which is why I use daily, uxnemu takes too much> ram for my likes) I do it by hand:> > https://git.sr.ht/~rabbits/uxn11/tree/main/item/src/uxn11.c#L251> > Maybe this can help you?
Thanks for the pointer. From a quick look, it seems a lot like gr/damage[1],
which is already what I was planning to use to make gr/varvara screen updates
more precise.
[1]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/gr/damage.txt
On Fri, Oct 04, 2024 at 11:14:56AM -0400, Virgil Dupras wrote:
> But this doesn't make varvara drawing noticeably faster. Left is still janky> with mouse movement.> > Right now, gr/varvara always redraws its whole screen on any change. On a fast> computer, it doesn't matter, but even on Usermode Dusk on a RPi, it's bearable.> My guess, now, is that SDL has an elaborate damage detection mechanism to> minimize screen updates. I don't see how else there could be such a big> discrepancy between varvara under Usermode and Bare Metal on the same RPi.> > So I'll need to be more surgical in varvara screen updates.
I've made good progress on that front today. In my latest commits, gr/varvara
uses gr/damage for partial screen redraws. Combined with the double screen
buffering[1], this results in Left being much, much more usable on a RPi model
1. Typing is still done with a bit of delay, but at least mouse movement and
clicks aren't sluggish anymore.
These changes helped a bit with Bunnymark, but not enough to be considered good.
Even without bunnies, we get 5 FPS on a RPi1.
The double-buffering logic is in the RPi port's "wip" branch[2].
But I'm thinking I don't like this double buffering business all that much. I'll
explore another avenue, that is to reorganize gr/pix's mapping algorithm so that
we almost always make 32-bit writes. So depending on whether destination pixsz
the mapping could process 1, 2 or 4 source pixels before making its write. Sure,
this would make gr/pix more complex, but would save Dusk as a whole from the
need of double buffering.
[1]: that is, having an in-memory copy of the framebuffer and copy that buffer
to the framebuffer periodically. Why does it help? I'm not 100% sure, but I
think it's because 8-bit/16-bit accesses to the framebuffer are super super slow
because the ARM CPU has to read from the FB to OR in that 8-bit/16-bit value in
the target 32-bit cell. gr/varvara drawing does 16-bit writes to its target
screen. Double-buffer copying is done 100% in 32-bit writes.
[2]: https://git.sr.ht/~vdupras/duskos-deployments/tree/wip/item/rpi/init.fs
On Wed, Oct 09, 2024 at 11:18:51AM -0400, Virgil Dupras wrote:
> But I'm thinking I don't like this double buffering business all that much. I'll> explore another avenue, that is to reorganize gr/pix's mapping algorithm so that> we almost always make 32-bit writes. So depending on whether destination pixsz> the mapping could process 1, 2 or 4 source pixels before making its write. Sure,> this would make gr/pix more complex, but would save Dusk as a whole from the> need of double buffering.
Good stuff came out of this exploration! Initially, it wasn't very good because
the complexity I had to add to gr/pix was too much. I almost gave up, then I
didn't, then I did. I was looking into expanding the HAL again with new
registers to make this kind of algorithm more feasible. But that meant making
kernels more complex! Murky work again.
And then I thought: no. HAL in the kernel is for the purpose of booting up only.
Anything fancier has to be done at a higher level, using assemblers. This wasn't
the first time this idea hit me, but the last time it did, lower/upper HAL
separation was itself more complex than the stuff I was trying to separate.
But now it was worth it. This new "&nf," operator I had just added was already
a mistake, but it was still time to backtrack. So I did. Moved kernel query
words to hal/opq[1], then split asm/halm into hal/muldiv[2] and hal/vmove[3].
Finally, the road was clear for my new idea: Buffered writes[4], that is, to
compound 4 byte writes (or 2 word writes) in registers before writing it all in
32-bit mode. Sure, implementing this with new HAL registers would have been OK,
but it's much better to implement them directly in assembler. This allows for
even more efficient code (the ARM version is pretty slick).
Plugged it into gr/pix, it works! Benchmarks? Slightly faster than the
double-buffering thing, but so much more elegant! Bunnymarks at 6 FPS.
So, yay!
[1]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/hal/opq.txt
[2]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/hal/muldiv.txt
[3]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/hal/vmove.txt
[4]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/hal/bwr.txt