Hallo everyone,
In multiple plugins I manage long running processes using vis:communicate.
This is the case in vis-lspc and my vis-inotify plugin, which I use to
atomically reload the theme if the theme file on disk changed.
Both plugins are leaking their processes, because they are not terminated
when vis exits and are simply passed on to init.
While vis is running:
$ ps axo cmd,pid,ppid | grep ino
inotifywait -P /home/muhq/. 115972 115948
After vis was closed:
$ ps axo cmd,pid,ppid | grep ino
inotifywait -P /home/muhq/. 115972 1
Even a graceful shutdown by subscribing to the QUIT event does not work because
the subprocesses are only checked and killed during the main loop in vis_run.
Those patches fix this issue by checking the lifetime and potentially
killing processes one last time after the QUIT event was emitted.
This allows plugin developers to subscribe to this event and shutdown their long
running processes without leakage.
See: The vis-inotify [1] and vis-lspc [2] commits
---
Flo
[1]: https://git.muhq.space/vis-inotify/commit/?id=61f79bb3046e4b548f2a1798abccb7b53e284516
[2]: https://git.muhq.space/vis-lspc/commit/?id=62e6a847a85e25037a4a7fbcd4af697928d5ecfc
[PATCH 1/2] move waiting and potentially killing a subprocess into a helper function
The separation between reading from a subprocess and handling its
life time will be useful for future changes.
---
vis-subprocess.c | 54 ++++++++++++++++++++++++++++++------------------
1 file changed, 34 insertions(+), 20 deletions(-)
diff --git a/vis-subprocess.c b/vis-subprocess.c
index 39282e47..3fe1dde7 100644
--- a/vis-subprocess.c+++ b/vis-subprocess.c
@@ -176,6 +176,36 @@ static void read_and_fire(Vis* vis, int fd, const char *name, ResponseType rtype
}
}
+/**+ * Checks if a subprocess is dead or needs to be killed then raises an event+ * or kills it if necessary.+ * @param current the process to wait for or kill+ * @return true if the process is dead+ */+static int wait_or_kill_process(Vis *vis, Process *current) {+ int status;+ pid_t wpid = waitpid(current->pid, &status, WNOHANG);+ if (wpid == -1) {+ vis_message_show(vis, strerror(errno));+ } else if (wpid == current->pid) {+ goto just_destroy;+ } else if (!*(current->invalidator)) {+ goto kill_and_destroy;+ }+ return false;++kill_and_destroy:+ kill(current->pid, SIGTERM);+ waitpid(current->pid, &status, 0);+just_destroy:+ if (WIFSIGNALED(status)) {+ vis_lua_process_response(vis, current->name, NULL, WTERMSIG(status), SIGNAL);+ } else {+ vis_lua_process_response(vis, current->name, NULL, WEXITSTATUS(status), EXIT);+ }+ return true;+}+/**
* Checks if `readfds` contains file descriptors of subprocesses from
* the pool. If so, it reads their data and fires corresponding events.
@@ -192,27 +222,11 @@ void vis_process_tick(Vis *vis, fd_set *readfds) {
if (current->errfd != -1 && FD_ISSET(current->errfd, readfds)) {
read_and_fire(vis, current->errfd, current->name, STDERR);
}
- int status;- pid_t wpid = waitpid(current->pid, &status, WNOHANG);- if (wpid == -1) {- vis_message_show(vis, strerror(errno));- } else if (wpid == current->pid) {- goto just_destroy;- } else if (!*(current->invalidator)) {- goto kill_and_destroy;- }- pointer = ¤t->next;- continue;-kill_and_destroy:- kill(current->pid, SIGTERM);- waitpid(current->pid, &status, 0);-just_destroy:- if (WIFSIGNALED(status)) {- vis_lua_process_response(vis, current->name, NULL, WTERMSIG(status), SIGNAL);+ if (!wait_or_kill_process(vis, current)) {+ pointer = ¤t->next; } else {
- vis_lua_process_response(vis, current->name, NULL, WEXITSTATUS(status), EXIT);+ /* update our iteration pointer */+ *pointer = destroy_process(current); }
- /* update our iteration pointer */- *pointer = destroy_process(current); }
}
--
2.47.0
[PATCH 2/2] check the life time of subprocesses before freeing vis
Currently there is now way for long running subprocesses like language
servers to gracefully shutdown.
When reacting to the QUIT event and invalidating the process handle
the subprocess will never be killed and destroyed because the
subprocesses are only checked during vis_run.
Collecting and killing subprocesses with invalid handles after the
QUIT event allows graceful shutdown.
---
vis-subprocess.c | 16 ++++++++++++++++vis-subprocess.h | 1 +vis.c | 1 +
3 files changed, 18 insertions(+)
diff --git a/vis-subprocess.c b/vis-subprocess.c
index 3fe1dde7..beac9998 100644
--- a/vis-subprocess.c+++ b/vis-subprocess.c
@@ -230,3 +230,19 @@ void vis_process_tick(Vis *vis, fd_set *readfds) {
}
}
}
++/**+ * Checks if each subprocess from the pool is dead or needs to be+ * killed then raises an event or kills it if necessary.+ */+void vis_process_waitall(Vis *vis) {+ for (Process **pointer = &process_pool; *pointer; ) {+ Process *current = *pointer;+ if (!wait_or_kill_process(vis, current)) {+ pointer = ¤t->next;+ } else {+ /* update our iteration pointer */+ *pointer = destroy_process(current);+ }+ }+}
diff --git a/vis-subprocess.h b/vis-subprocess.h
index 2e4c2220..79a043e9 100644
--- a/vis-subprocess.h+++ b/vis-subprocess.h
@@ -27,4 +27,5 @@ Process *vis_process_communicate(Vis *, const char *command, const char *name,
Invalidator **invalidator);
int vis_process_before_tick(fd_set *);
void vis_process_tick(Vis *, fd_set *);
+void vis_process_waitall(Vis *);#endif
diff --git a/vis.c b/vis.c
index 10c3df57..d1c0dab0 100644
--- a/vis.c+++ b/vis.c
@@ -623,6 +623,7 @@ void vis_free(Vis *vis) {
while (vis->windows)
vis_window_close(vis->windows);
vis_event_emit(vis, VIS_EVENT_QUIT);
+ vis_process_waitall(vis); file_free(vis, vis->command_file);
file_free(vis, vis->search_file);
file_free(vis, vis->error_file);
--
2.47.0
Hi Florian, everyone,
> In multiple plugins I manage long running processes using vis:communicate.> This is the case in vis-lspc and my vis-inotify plugin, which I use to> atomically reload the theme if the theme file on disk changed.> > Both plugins are leaking their processes, because they are not terminated> when vis exits and are simply passed on to init.> > While vis is running:> $ ps axo cmd,pid,ppid | grep ino> inotifywait -P /home/muhq/. 115972 115948> > After vis was closed:> $ ps axo cmd,pid,ppid | grep ino> inotifywait -P /home/muhq/. 115972 1> > Even a graceful shutdown by subscribing to the QUIT event does not work because> the subprocesses are only checked and killed during the main loop in vis_run.> > Those patches fix this issue by checking the lifetime and potentially> killing processes one last time after the QUIT event was emitted.> This allows plugin developers to subscribe to this event and shutdown their long> running processes without leakage.> > See: The vis-inotify [1] and vis-lspc [2] commits
This seems like a great addition, and the code looks good to me.
I'll wait a bit to see if others have remarks before applying, though.
Cheers,
Felix
> [1]: https://git.muhq.space/vis-inotify/commit/?id=61f79bb3046e4b548f2a1798abccb7b53e284516> [2]: https://git.muhq.space/vis-lspc/commit/?id=62e6a847a85e25037a4a7fbcd4af697928d5ecfc
Florian Fischer <florian.fischer@muhq.space> wrote:
> Hallo everyone,> > In multiple plugins I manage long running processes using vis:communicate.> This is the case in vis-lspc and my vis-inotify plugin, which I use to> atomically reload the theme if the theme file on disk changed.> > Both plugins are leaking their processes, because they are not terminated> when vis exits and are simply passed on to init.> > While vis is running:> $ ps axo cmd,pid,ppid | grep ino> inotifywait -P /home/muhq/. 115972 115948> > After vis was closed:> $ ps axo cmd,pid,ppid | grep ino> inotifywait -P /home/muhq/. 115972 1> > Even a graceful shutdown by subscribing to the QUIT event does not work because> the subprocesses are only checked and killed during the main loop in vis_run.> > Those patches fix this issue by checking the lifetime and potentially> killing processes one last time after the QUIT event was emitted.> This allows plugin developers to subscribe to this event and shutdown their long> running processes without leakage.> > See: The vis-inotify [1] and vis-lspc [2] commits> > ---> Flo> > [1]: https://git.muhq.space/vis-inotify/commit/?id=61f79bb3046e4b548f2a1798abccb7b53e284516> [2]: https://git.muhq.space/vis-lspc/commit/?id=62e6a847a85e25037a4a7fbcd4af697928d5ecfc
Hi Florian,
What do you mean by leaked? Do you mean there are zombie processes
hanging around after vis closes? In that case I'm fine with this.
If its just that right now the operating system is cleaning up for
us then I don't think this patch should be applied. That is the
point of the operating system after all and they will do a much
better job than we can.
- Randy
--
https://rnpnr.xyz/
GPG Fingerprint: B8F0 CF4C B6E9 415C 1B27 A8C4 C8D2 F782 86DF 2DC5
Hi Randy,
>> Both plugins are leaking their processes, because they are not terminated>> when vis exits and are simply passed on to init.>> >> While vis is running:>> $ ps axo cmd,pid,ppid | grep ino>> inotifywait -P /home/muhq/. 115972 115948>> >> After vis was closed:>> $ ps axo cmd,pid,ppid | grep ino>> inotifywait -P /home/muhq/. 115972 1>> >> Even a graceful shutdown by subscribing to the QUIT event does not work because>> the subprocesses are only checked and killed during the main loop in vis_run.>> >> Those patches fix this issue by checking the lifetime and potentially>> killing processes one last time after the QUIT event was emitted.>> This allows plugin developers to subscribe to this event and shutdown their long>> running processes without leakage.>>Hi Florian,>>What do you mean by leaked? Do you mean there are zombie processes>hanging around after vis closes? In that case I'm fine with this.
No the processes we leave behind are no zombies, they are still running and now effectively daemonized.
See my inotifywait example above.
That's because the processes are not signaled when vis quits, except they have asked for receiving SIGHUP when the parent exits.
And invalidating the process handle and thus attempting to send them SIGTERM during the QUIT event does not work either, because the subprocesses are not handled after the QUIT event was emitted.
And this is what the patches change.
There is now one last round of checking each subprocess, detecting possible invalidated handles and sending SIGTERM.
This change allows plugin developers to shutdown their subprocesses on exit.
Hope this better explains what is going on.
Flo
Hi,
This is my first ever take on reviewing a patch. I'm also not the most
familiar with vis' source code; I have skimmed through parts relevant to
this review, but haven't taken the time to understand it all deeply. I'd
appreciate any feedback!
On 10/25/24 8:04 AM, Florian Fischer wrote:
> Those patches fix this issue by checking the lifetime and potentially
> killing processes one last time after the QUIT event was emitted.
> This allows plugin developers to subscribe to this event and shutdown
their long
> running processes without leakage.
I'm not sure if a blocking wait() on all the processes is a good idea.
If the process happened to be unresponsive (e.g., it's performing a
lengthy cleanup sequence, or maybe even ignoring the signal altogether),
the editor would hang.
> See: The vis-inotify [1] and vis-lspc [2] commits
I briefly looked into both cases, and it seems to me that the plugins
don't do anything useful after the process actually dies, besides
printing a message and modifying some internal state (and possibly
spawning new processes (!!!), in the case of 'recreate' mode in
vis-inotify [1][2]). In most cases, we could probably get away with
dying quickly while still cleaning up after ourselves as best we can.
I propose the following changes:
1. Remove the blocking wait() from wait_or_kill_process(). This prevents
the editor from unnecessarily hanging, not just on exit, but during the
main loop as well.
It's also probably a good idea to keep track of the kill() so we aren't
trying to kill a (possibly) dying process on every tick (also, some
programs see double-kills as meaning "die immediately and ungracefully").
2. For plugins which must wait(), provide some alternative means of
specifying this, such as a flag of some sorts. Maybe even accompany it
with a helpful message in the editor, so the user isn't thinking, "Is a
subprocess dying slowly, or is the editor putting invalid keys in the
input queue again? [3]" :-)
3. Kill ALL subprocesses on exit (or at least die trying). This is, in
my opinion, the least surprising behaviour for both the user (who
doesn't want garbage processes running on their system) and the plugin
developer (who benefits from convenient abstractions).
-karthin
PS: wait_or_kill_process() returns int, but I believe bool is more
correct. The part after the kill_and_destroy label is incorrectly indented.
[1] Which, I believe, is highly dependent on how new_process_in_pool()
_prepends_ to the pool (a linked list). Because if it were to append,
that would lead to an infinite kill-wait-spawn cycle.
[2]
https://git.muhq.space/vis-inotify/tree/init.lua?id=2a955d6bb5b3cee3f17c77f32a997e4710e97e27#n44
[3] https://github.com/martanne/vis/pull/1125
Hi Karthin,
> This is my first ever take on reviewing a patch. I'm also not the most > familiar with vis' source code; I have skimmed through parts relevant to > this review, but haven't taken the time to understand it all deeply. I'd > appreciate any feedback!
Thanks a lot for taking the time and reviewing this change!
> I briefly looked into both cases, and it seems to me that the plugins > don't do anything useful after the process actually dies, besides > printing a message and modifying some internal state (and possibly > spawning new processes (!!!), in the case of 'recreate' mode in > vis-inotify [1][2]). In most cases, we could probably get away with > dying quickly while still cleaning up after ourselves as best we can.
You are totally right the recreate mode is simply broken.
No new watches should be created if we explicitly killed the last during the QUIT
event. This is fixed in vis-inotify commit 77706bc [1].
> I propose the following changes:>> 1. Remove the blocking wait() from wait_or_kill_process(). This prevents > the editor from unnecessarily hanging, not just on exit, but during the > main loop as well.>> It's also probably a good idea to keep track of the kill() so we aren't > trying to kill a (possibly) dying process on every tick (also, some > programs see double-kills as meaning "die immediately and ungracefully").
I do not have the time and energy to implement that suggestion currently.
Maybe I am urged if my vis hangs because of an unresponsive child process.
> 2. For plugins which must wait(), provide some alternative means of > specifying this, such as a flag of some sorts. Maybe even accompany it > with a helpful message in the editor, so the user isn't thinking, "Is a > subprocess dying slowly, or is the editor putting invalid keys in the > input queue again? [3]" :-)
Maybe adding a synchronous blocking wait method to the lua subprocess handle is
a good idea.
But I will not work on this in the near future.
> 3. Kill ALL subprocesses on exit (or at least die trying). This is, in > my opinion, the least surprising behaviour for both the user (who > doesn't want garbage processes running on their system) and the plugin > developer (who benefits from convenient abstractions).
I have no strong opinion on that.
Most of the time I think that as few implicit behavior as possible is the best
approach.
But here terminating ALL child processes seems convenient and does not stop
plugin developers to spawn deamons surviving the editors shutdown.
Any opinions on this? How do other editors behave in this regard?
> PS: wait_or_kill_process() returns int, but I believe bool is more > correct. The part after the kill_and_destroy label is incorrectly indented.
Fixed in a potentially second version, thanks :)
- Flo