Hello all,
I've just pushed a new feature to Dusk: iterators (see doc/iter). I'm extremely
happy of how it turned out and as I'm briefly looking around, I'm beginning to
think that I've been really innovative with that one.
GForth only seems to have traditional looping. RetroForth, which I don't know
much, has "for-each" which could be seen as something close to my iterators, but
the documentation says "RETRO also provides `for-each` combinators for various
data structures.", which makes me think that it's not a generic mechanism. A
quick search of other forths on github gave me nothing.
So, the goal of this message is to poke people who would know more than me about
the Forth ecosystem and tell me if any other Forth has something remotely as
powerful. If not, then I'll claim my fame!
My iterators are super simple and super powerful! They rest on the concept of
coroutines (see doc/usage), which is implemented with one simple native word,
"yield", which on i386 is only 6 lines of assembler.
: foo ." foo" yield ." hey" ;
: bar ." bar" foo ." hello" yield ." goodbye" ;
bar \ prints "barfoohelloheygoodbye"
Upon this rests ":iter", an iterator builder. Again, the implementation is
rather simple, the whole wordset (:iter, next, unyield, break, i, j) being 13
lines of Forth.
And this mechanism allow to very easily create iterators that are super
convenient to use! For example, the regular "countdown" loop is implemented as:
:iter for ( n -- )
?dup if to i begin yield -1 to+ i i not until then unyield ;
It can be used like:
: foo 5 for i . spc> next ;
foo \ prints "5 4 3 2 1 "
And that's it! Simple, powerful effective. So, have you seen this anywhere else?
Regards,
Virgil
> So, the goal of this message is to poke people who would know more than me about> the Forth ecosystem and tell me if any other Forth has something remotely as> powerful. If not, then I'll claim my fame!
Reading into a bit, I'm not sure I follow :D
I bit of my understanding of iterators: most low-level languages (Rust
and C++ I think) basically compile them down to simple state machines.
They are a struct with:
1. a next() method that alters the struct, providing the "current
iteration value"
2. a loop that calls next() and checks the value for whether it is
complete (or calls an isDone() method)
Rust and C++ typically inline all this for many structs. I don't see
why you couldn't do a similar thing in Forth if you supported
namespaces (i.e. MyStruct . myMethod) -- your compiler could simply
look for the "next" method and compile it for your struct.
I think you are going to have some difficulty improving on the above
architecture. Looking at your coroutines, I have some concerns. From
doc/usage.txt[1]
> Coroutines are two routines that intertwine at the top of RS.
Is it limited to exactly two? This is not normally what a coroutine
is. Maybe call it something else like duoroutines or something?
Continuing to read on, it seems like perhaps you could have more than
two if you stored the "current continue value" in a local variable/etc
in the "base" function -- so maybe duoroutines isn't actually right.
> With coroutine, we don't push and pop, we *swap* return addresses on RS until iteration is completed
I assume there are some constraints here...
1. Can one coroutine have (and preserve) locals between calls? I don't
understand how unless it has its own RS (and probably its own WS too!)
2. Related to the above, I assume coroutines can only "yield" at the
base level -- i.e. while they can call functions, those functions are
not allowed to "yield"... or at least their yield won't yield all the
way up the stack?
Civboot also has the concept of coroutines, although they are not very
used yet. Basically each coroutine has its own separate WS, RS, a link
to the prev/next coroutine, etc (a few more minor pointers).
"Yielding" just causes the executor to execute the next coroutine in
the LL.
That was my original design anyway -- my current C implementation
isn't going to be this nice and will also require me to store the
entire C stack -- something I'm not sure I can get around -- but C
supposedly has similar mechanisms to construct co-routines from
"scratch".
I have other ideas, such as lambda functions having access to "locals"
by storing the offset of the current function's locals and any usage
of them causing a fetch/store-ing to them, enabling a "functional
style" of programming using map, etc. I _think_ this might be similar
to your goals?
Best,
Rett
[1]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/usage.txt
On Wed, Jun 14, 2023 at 1:15 PM Arcade Wise <l3gacy.b3ta@disroot.org> wrote:
>> This is fascinating! I love it.>> --> Arcade Wise (they/them)> <arcades.agency>>
On Wed, Jun 14, 2023, at 2:35 PM, Rett Berg wrote:
>> So, the goal of this message is to poke people who would know more than me about>> the Forth ecosystem and tell me if any other Forth has something remotely as>> powerful. If not, then I'll claim my fame!>> Reading into a bit, I'm not sure I follow :D>> I bit of my understanding of iterators: most low-level languages (Rust> and C++ I think) basically compile them down to simple state machines.> They are a struct with:>> 1. a next() method that alters the struct, providing the "current> iteration value"> 2. a loop that calls next() and checks the value for whether it is> complete (or calls an isDone() method)>> Rust and C++ typically inline all this for many structs. I don't see> why you couldn't do a similar thing in Forth if you supported> namespaces (i.e. MyStruct . myMethod) -- your compiler could simply> look for the "next" method and compile it for your struct.
With this approach, iterators would be limited to structs and there could only
be one iterator per struct. It would also make it complicated for a struct to
support nesting in their "next" implementation (where do they store iteration
information?). I think my current approach is more flexible.
> I think you are going to have some difficulty improving on the above> architecture. Looking at your coroutines, I have some concerns. From> doc/usage.txt[1]>>> Coroutines are two routines that intertwine at the top of RS.>> Is it limited to exactly two? This is not normally what a coroutine> is. Maybe call it something else like duoroutines or something?
When I look at the definition of coroutine on wikipedia[1], my implementation
seems to correspond to what they call "asynchronous stackless coroutine". Being
able to have more than two routines involved don't seem to be a requirement to
the name "coroutine".
> Continuing to read on, it seems like perhaps you could have more than> two if you stored the "current continue value" in a local variable/etc> in the "base" function -- so maybe duoroutines isn't actually right.>>> With coroutine, we don't push and pop, we *swap* return addresses on RS until iteration is completed>> I assume there are some constraints here...>> 1. Can one coroutine have (and preserve) locals between calls? I don't> understand how unless it has its own RS (and probably its own WS too!)
Yes, if stored *under* the yield address in RS. When an iterator begins doing
its thing, it allocates 16b. The bottom 12b are for i, j and k and the top RS
element is for the "yield". This makes iterators nestable.
WS has to stay neutral between yields, this is a limitation of this system.
Iteration state is limited to those 12 bytes for i/j/k.
> 2. Related to the above, I assume coroutines can only "yield" at the> base level -- i.e. while they can call functions, those functions are> not allowed to "yield"... or at least their yield won't yield all the> way up the stack?
Yields can be nested as long as the caller expect the callee's yield, everything
is fine.
> Civboot also has the concept of coroutines, although they are not very> used yet. Basically each coroutine has its own separate WS, RS, a link> to the prev/next coroutine, etc (a few more minor pointers).> "Yielding" just causes the executor to execute the next coroutine in> the LL.
It's a bit heavy for my taste, that's why I was so happy to have this
lightweight iterator idea working.
> That was my original design anyway -- my current C implementation> isn't going to be this nice and will also require me to store the> entire C stack -- something I'm not sure I can get around -- but C> supposedly has similar mechanisms to construct co-routines from> "scratch".>> I have other ideas, such as lambda functions having access to "locals"> by storing the offset of the current function's locals and any usage> of them causing a fetch/store-ing to them, enabling a "functional> style" of programming using map, etc. I _think_ this might be similar> to your goals?
No, my goals simply were to have a simple iterator implementation. It's been a
while since I introduced them, they had time to prove themselves. They certainly
have limitations, but they work and they're flexible.
>> Best,> Rett>> [1]: https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/usage.txt
Regards,
Virgil
[1]: https://en.wikipedia.org/wiki/Coroutine
"Virgil Dupras" <hsoft@hardcoded.net> writes:
> On Wed, Jun 14, 2023, at 2:35 PM, Rett Berg wrote:>>> So, the goal of this message is to poke people who would know more than me about>>> the Forth ecosystem and tell me if any other Forth has something remotely as>>> powerful. If not, then I'll claim my fame!>>>> Reading into a bit, I'm not sure I follow :D>>>> I bit of my understanding of iterators: most low-level languages (Rust>> and C++ I think) basically compile them down to simple state machines.>> They are a struct with:>>>> 1. a next() method that alters the struct, providing the "current>> iteration value">> 2. a loop that calls next() and checks the value for whether it is>> complete (or calls an isDone() method)>>>> Rust and C++ typically inline all this for many structs. I don't see>> why you couldn't do a similar thing in Forth if you supported>> namespaces (i.e. MyStruct . myMethod) -- your compiler could simply>> look for the "next" method and compile it for your struct.>> With this approach, iterators would be limited to structs and there could only> be one iterator per struct. It would also make it complicated for a struct to> support nesting in their "next" implementation (where do they store iteration> information?). I think my current approach is more flexible.
The struct is the iterator, what it iterates over can be anything.
All that matters is we have two function pointers in the struct (making
the struct an implementation of the iterator interface) and one of them
tells us if the stream is over and the other gives us the next item.
The latter usually modifies some internal state. In higher level
languages this is done by using a closure instead of a dumb C function,
or we can pass the struct to the functions, making them methods, and
then state can be stored in the struct.
For a simple range iterator for machine ints this would take 5 ints per
iterator state. Two function pointers, two to store the range, one to
store the current number. That's only 20 bytes on 32 bit.
On Wed, Jun 14, 2023, at 6:33 PM, Csepp wrote:
> "Virgil Dupras" <hsoft@hardcoded.net> writes:>>> On Wed, Jun 14, 2023, at 2:35 PM, Rett Berg wrote:>>>> So, the goal of this message is to poke people who would know more than me about>>>> the Forth ecosystem and tell me if any other Forth has something remotely as>>>> powerful. If not, then I'll claim my fame!>>>>>> Reading into a bit, I'm not sure I follow :D>>>>>> I bit of my understanding of iterators: most low-level languages (Rust>>> and C++ I think) basically compile them down to simple state machines.>>> They are a struct with:>>>>>> 1. a next() method that alters the struct, providing the "current>>> iteration value">>> 2. a loop that calls next() and checks the value for whether it is>>> complete (or calls an isDone() method)>>>>>> Rust and C++ typically inline all this for many structs. I don't see>>> why you couldn't do a similar thing in Forth if you supported>>> namespaces (i.e. MyStruct . myMethod) -- your compiler could simply>>> look for the "next" method and compile it for your struct.>>>> With this approach, iterators would be limited to structs and there could only>> be one iterator per struct. It would also make it complicated for a struct to>> support nesting in their "next" implementation (where do they store iteration>> information?). I think my current approach is more flexible.>> The struct is the iterator, what it iterates over can be anything.> All that matters is we have two function pointers in the struct (making> the struct an implementation of the iterator interface) and one of them> tells us if the stream is over and the other gives us the next item.> The latter usually modifies some internal state. In higher level> languages this is done by using a closure instead of a dumb C function,> or we can pass the struct to the functions, making them methods, and> then state can be stored in the struct.>> For a simple range iterator for machine ints this would take 5 ints per> iterator state. Two function pointers, two to store the range, one to> store the current number. That's only 20 bytes on 32 bit.
By "where to store the information?", I wasn't referring to its structure, but
to its place in memory. What you describe would work, but it wouldn't solve the
"where?" question.
Where do we put that structure? In a static buffer unique to each of these
structures? This means no nesting. In a scratchpad? A wide outer loop would get
overwritten by its inner loop information.
A global stack for iterator structures? That could work, but this, combined with
the structure you describe, would bust my complexity budget.
I believe that my current solution strikes a good balance even though it has
significant constraints.