~technomancy/fennel

5 3

Towards a better REPL - a communication protocol

Details
Message ID
<87o7uuw4sj.fsf@gmail.com>
DKIM signature
missing
Download raw message
For the past few days, I've been working on a better Fennel REPL support
in Emacs, mainly improving the comint integration parts for redirecting
commands to invisible buffers, to provide docstrings, arglists,
completions, etc.  There are several problems in the current approach
which are there because Fennel REPL is a synchronous application that
uses stdin as its protocol.

This is not ideal and makes it really hard to maintain a reliable
connection and communication separation.  For instance, when some
long-running expression is blocking the REPL, sending a ,complete
command and querying for results will hang, and probably retrieve the
wrong candidate list, which will actually be the result of the
long-running loop.  E.g. running this in the REPL:

    >> (do (for [i 1000000000] nil) (print "foo1\tfoo2\tfoo3"))

And switching to another buffer, typing `(foo` and hitting the Tab key
yields the completions: foo1, foo2, and foo3.  Here's a screenshot:
https://i.imgur.com/DQ4ODaI.png

This happens due to the fact, that the comint interface is unaware of
what output it should wait for, and grabs the first output appearing in
stdin, thus giving the wrong list of completions.  While mostly
harmless, this can break user workflow, and create the feeling of a
broken development environment.  Not to mention that there's a potential
deadlock when the comint thinks that the redirect is still in the
progress, even though it's already finished, but REPL refuses to receive
new inputs.  (Though I think I've fixed the last problem with reworked
redirection.)

Therefore I would like to propose a simple protocol.

---

I would like to note that this should be part of the default Fennel REPL
and not a separate library, as this is aimed at making Fennel better at
integrating with other software.

I guess this can be just a versioned flag, maybe similar to how git
handles the --porcelain flag (--porcelain=v1) in case we'll later change
the protocol in an incompatible way, by changing the message format for
instance.

When the REPL is started in the protocol mode each incoming message
should contain a type, so the REPL could know how to respond to it in
particular.  Then the communication will be much easier to parse for
external tools.

I think that each communication between the REPL and the editor should
use a certain notion for message types and response types.  Right now
it's already a thing, well, sort of - we're using REPL commands as a
protocol interface, but the response has no type, and the tooling can
easily confuse responses, especially if messages were sent
asynchronously, which is a possibility right now due to stdin buffering.

These are the types I've come up with, for now, suggestions are welcome.
All messages are single lines with the first value a literal message
type, separated by tab characters from the rest of its parameters.

Editor->REPL message types (mostly the same as the current command
feature set):

* ->eval - evaluate the expression
* ->complete - produce completions
* ->arglist - return arglist for a given fn or a macro
* ->apropos - apropos related commands, contains the sub-type for doc,
  and show-docs variants.
* ->doc - docstring query
* ->compile - compile to Lua and return the code as a string
* ->find - return the filename for the identifier
* ->reload - module reloading
* ->reset - resetting repl locals
* ->compatibilities - set of the supported message types

REPL->Editor response types:

* <-data - the regular answer for the eval command.
  Will be written to the REPL buffer as is as an answer.
* <-pp - same as data, but in a pretty printed form (e.g. contains
  escaped newlines), returned as a one-line string.
* <-complete - line of tab-separated completions.
* <-arglist - line of tab-separated function arguments.
* <-apropos - line of tab-separated apropos candidates.
* <-doc - a documentation string, returned as a one-liner.
* <-compile - a compiled Lua string, returned as a one-liner.
* <-find - tab-separated filename and line info.
* <-reset, <-reload - operation status, as a one-liner string.
* <-compatibilities - set of supported message types
* <-error - some kind of an error

This way, a typical session for the REPL will look like this:

->eval	(+ 1 2 3)
<-data	6
->eval	(for [i 1 1000000] nil)
->complete	f
<-data	nil
<-complete	fn	for	fcollect
->eval	(doto {} (tset :a 1) (tset :b 2))
<-pp	{:a 1\n:b 2}
->eval	(values 1 2 3)
<-data	1	2	3
->compile	(fn x [] x)
<-compile	local function x()\n  return "x"\nend\nreturn x
->eval	foo
<-error	compile	Compile error in unknown:1:0\n  unknown identifier in strict mode: foo

Note that there's no prompt, as this repl is not meant as an interactive
process, but rather as a communication one. The editor will still send
stuff to stdin, it will just prepend the message type to each message,
and the REPL will accept it, and respond accordingly.  This way, even if
the editor is waiting for the completion candidates while the REPL is
busy with something else a proper message filter can be set up to handle
ambiguity.

All messages are one-liners, which means that literal newlines should be
replaced by the REPL with escaped ones, which is also true for the
Editor.  (Rewritten REPL integration for Emacs already does this,
minifying the code before sending it.)  When receiving messages from the
REPL, the editor takes responsibility to pretty-print the lines which
contain literal newlines (the <-pp responses)

To start the REPL in this mode, a flag should be passed, and the REPL
will immediately respond with the capabilities:

$ fennel --repl --porcelain=v1
<-compatibilities	->eval	->complete	...	<-data	<-pp	...

Then it waits for user input as per usual, except it doesn't show any
prompt.

Tell me what you think! I'm also interested in thoughts from the Vim
plugin developers (like Conjure).  Any other alternative protocol
proposals are also welcome, let's make this a general discussion thread
first!

--
Andrey Listopadov
Details
Message ID
<878rlwnb68.fsf@gmail.com>
In-Reply-To
<87o7uuw4sj.fsf@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> For instance, when some
> long-running expression is blocking the REPL, sending a ,complete
> command and querying for results will hang, and probably retrieve the
> wrong candidate list, which will actually be the result of the
> long-running loop.  E.g. running this in the REPL:
>
>     >> (do (for [i 1000000000] nil) (print "foo1\tfoo2\tfoo3"))
>
> And switching to another buffer, typing `(foo` and hitting the Tab key
> yields the completions: foo1, foo2, and foo3.  Here's a screenshot:
> https://i.imgur.com/DQ4ODaI.png
>
> This happens due to the fact, that the comint interface is unaware of
> what output it should wait for, and grabs the first output appearing in
> stdin, thus giving the wrong list of completions.

Just to be a bit more clear on what's happening here, I've added a
logging feature to Fennel REPL support in Emacs, and right now there are
two types of messages --eval-> <-data-- for regular REPL evaluation,
and --redr-> <-redr--, for redirected evaluation, e.g. when querying for
completion or function argument lists.

So again, going to the REPL buffer, and executing the loop would write
this to the log:

--eval-> (for [i 1 1000000000] nil)
<-data-- nil

And when you write (f in the ordinary buffer and press the Tab key, the
log looks like this:

--redr-> ,complete f
<-redr-- fn  for  fcollect >>

However, when we first start the for loop and then switch to the file,
and write `(f`, the messaging log looks like this:

--eval-> (for [i 1 1000000000] nil)
--redr-> ,complete f
<-redr-- nil
<-data-- fn  for  fcollect >>

Note that the messages were sent to the REPL in proper order, but when
reading back there was a race. Comint was waiting in a redirection
pipeline for /any/ kind of output from the process. The REPL buffer also
waited for /any/ kind of output. Redirected process just won, and
grabbed the output of a for expression, and the REPL grabbed the output
of the ,complete command.

If, however, there were real message types, and, perhaps, message IDs,
we could do:

--eval->	1234	(for [i 1 1000000000] nil)
--complete->	1337	f
--complete->	2280	fo
<-eval--	1234	nil
<-complete--	1337    fn	for	fcollect
<-complete--	2280	for

Then the tooling that listens to the REPL could easily build a
relationship map between each sent message and the received one.

So I'm also adding message IDs to the proposal.
Details
Message ID
<87tu4k4sl4.fsf@gmail.com>
In-Reply-To
<878rlwnb68.fsf@gmail.com> (view parent)
DKIM signature
missing
Download raw message
> --eval->	1234	(for [i 1 1000000000] nil)
> --complete->	1337	f
> --complete->	2280	fo
> <-eval--	1234	nil
> <-complete--	1337    fn	for	fcollect
> <-complete--	2280	for
>
> Then the tooling that listens to the REPL could easily build a
> relationship map between each sent message and the received one.
>
> So I'm also adding message IDs to the proposal.

On second thought, maybe we can actually get away with just message ID
and direction.

1234	->	(foo bar)
1235	->	,complete f
1234	<-	:baz
1235	<-	fn	for	fcollect

This way all REPL has to keep a look at is incoming message ID, and
provide it back to the sender. Then no complex capability system is
needed.

Anyway, I'd like to hear thoughts of other people, maybe someone will
jump in and say that we're better off to do a full-fledged nREPL
support, and perhaps resurrect jeejah :)
Details
Message ID
<87sfk3uhdi.fsf@hagelb.org>
In-Reply-To
<87tu4k4sl4.fsf@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Andrey Listopadov <andreyorst@gmail.com> writes:

> On second thought, maybe we can actually get away with just message ID
> and direction.
>
> 1234	->	(foo bar)
> 1235	->	,complete f
> 1234	<-	:baz
> 1235	<-	fn	for	fcollect
>
> This way all REPL has to keep a look at is incoming message ID, and
> provide it back to the sender. Then no complex capability system is
> needed.
>
> Anyway, I'd like to hear thoughts of other people, maybe someone will
> jump in and say that we're better off to do a full-fledged nREPL
> support, and perhaps resurrect jeejah :)

Honestly, once you mentioned message-id, and we already have this list
of ops, it seems really close to the nrepl protocol. It kind of feels
redundant to invent this new protocol that is already so close to that
one.

The nrepl protocol is meant to be transport agnostic. Similar to how LSP
can work over a socket or work over stdin/stdout, what if a compiler
plugin could switch the repl over to "nrepl mode" where it just
communicated over stdin/stdout instead of over a socket? I think it
would be easy to adapt monroe to use this, in which case we could reuse
all the existing tooling there for completion, docstrings, etc, plus
other editors already know how to speak nrepl.

Also it sounds like what you're saying is that it's difficult in the
current approach to tell when the repl is ready for a new command. Is
this because it is inherently difficult to tell whether the last thing
printed was the prompt, or maybe because Emacs makes it difficult to
find that out given the API it exposes, or for some other reason?

So far the "protocol" is pretty explicit about comma-commands coming in
and a single line of space-delimited output coming out. In the context
of nrepl where you have to support concurrency, it's very important to
use a message-id so you can correlate output/done messages which may
arrive out of order with the various inputs that caused them, but since
Fennel doesn't have concurrency, the problem it's trying to solve is
dramatically simpler. The response can only ever come from the most
recent input that was received. It seems to be like it should be able to
be answered if you can determine whether the last thing the repl printed
was the prompt (admittedly false positives are possible here with a
program that prints prompt-like strings) but it sounds like this might
be oversimplifying things? Can you explain a bit why you don't think
this approach is viable?

-Phil
Details
Message ID
<87ilky74xz.fsf@aol.com>
In-Reply-To
<87sfk3uhdi.fsf@hagelb.org> (view parent)
DKIM signature
missing
Download raw message
Phil Hagelberg <phil@hagelb.org> writes:

> Honestly, once you mentioned message-id, and we already have this list
> of ops, it seems really close to the nrepl protocol. It kind of feels
> redundant to invent this new protocol that is already so close to that
> one.

+1 for nREPL, it's great! Unless there's a really good reason to, we shouldn't reinvent the wheel when we already have a mature protocol & ecosystem.

> The nrepl protocol is meant to be transport agnostic. [...]

I didn't know about that but that's good to know!

~ Hendursaga
Details
Message ID
<87czb65gtm.fsf@gmail.com>
In-Reply-To
<87sfk3uhdi.fsf@hagelb.org> (view parent)
DKIM signature
missing
Download raw message
> Honestly, once you mentioned message-id, and we already have this list
> of ops, it seems really close to the nrepl protocol. It kind of feels
> redundant to invent this new protocol that is already so close to that
> one.

True, true. However, nREPL has a bit more to it, bencode, for instance.
So, I've decided to start with something simpler, like Tab separated
lines.

> The nrepl protocol is meant to be transport agnostic. Similar to how LSP
> can work over a socket or work over stdin/stdout, what if a compiler
> plugin could switch the repl over to "nrepl mode" where it just
> communicated over stdin/stdout instead of over a socket? I think it
> would be easy to adapt monroe to use this, in which case we could reuse
> all the existing tooling there for completion, docstrings, etc, plus
> other editors already know how to speak nrepl.

Indeed, if it is able to work through stdin, this might be a good
choice.  I'm not sure about the compiler plugin, though, as the REPL is
a runtime thing - can the compiler plugin really change the way REPL
works?

> Also it sounds like what you're saying is that it's difficult in the
> current approach to tell when the repl is ready for a new command.

Yes, from Emacs' standpoint there's no way (at least that I know of) to
tell whether we can send more to the process.  We can, of course, block
Emacs until the REPL buffer prints the output, but it is not what I
think is desired from an interactivity standpoint.

> So far the "protocol" is pretty explicit about comma-commands coming in
> and a single line of space-delimited output coming out. In the context
> of nrepl where you have to support concurrency, it's very important to
> use a message-id so you can correlate output/done messages which may
> arrive out of order with the various inputs that caused them, but since
> Fennel doesn't have concurrency, the problem it's trying to solve is
> dramatically simpler.

Not quite.  Technically, Fennel has limited support for concurrency via
coroutines, and there is a library providing async TCP-based REPL.  The
default REPL is synchronous, yes, but it doesn't block Emacs from
sending to it or reading the process output, so from the comint
standpoint the REPL is asynchronous, since the read and write operations
are non-blocking.

Comint (or a regular user, for that matter) can write to the REPL
several messages, even if the REPL is busy looping, and the REPL will
consume inputs from the buffer when it's ready to do so.  Because of
this, we need message IDs.

In other words, when using on-type completion, pressing a key in the
editor will send a ,complete command for each input in a rather rapid
succession:

1234 -> ,complete f
1235 -> ,complete fo
1234 <- fn for fcollect foobar
1236 -> ,complete foo
1237 -> ,complete foob
1235 <- for foobar
1236 <- foobar
1237 <- foobar

In this type of situation, we want to remember the last
completion-related message ID, so we only show the most recent result.

Another situation is when we send ,complete and ,doc commands to get
function argument list for eldoc:

1234 -> ,complete for
1235 -> ,doc for
1234 <- for fortune-cookie
1235 <- (for iter-table ...) for loop doc...

In this kind of situation, we want to remember and ID for completion,
but we also don't want to confuse it with doc.  This technically doesn't
need IDs to resolve ambiguity, if we have OPs, but IDs are more
universal and lightweight here.

> The response can only ever come from the most recent input that was
> received. It seems to be like it should be able to be answered if you
> can determine whether the last thing the repl printed was the prompt
> (admittedly false positives are possible here with a program that
> prints prompt-like strings) but it sounds like this might be
> oversimplifying things? Can you explain a bit why you don't think this
> approach is viable?

Technically, you're right, we could just count prompts internally in
fennel-mode, but it is quite hard to do, given that each user input in
the REPL will increment the prompt counter and each user input in the
buffer will create a redirected request for some docs and other stuff.

This must be done via a single counter to work right so in this case it
will be handled by comint's filter functions, like
`comint-preoutput-filter-functions` and `comint-input-sender` and
possibly in `comint-redirect-filter-functions`. But this will not work
in case multiple sessions are used, which is possible to do (ob-fennel
package does it).

TL;DR: I don't see any other way to implement it without an ID-based
protocol, other than by incrementing a counter for sent messages, and
decrementing it for receiving, which looks like a very shaky method,
which doesn't work for multi-repl setups.

--
Andrey Listopadov
Reply to thread Export thread (mbox)