As is, soju leaks goroutines on client disconnects, because the closure
started as goroutine in `newConn()` never finishes. It gets stuck in the
for loop annotated as "draining the outgoing channel", because the
outgoing channel is in fact never closed.
This commit fixes the issue by calling `dc.Close()` when an
`eventDownstreamDisconnected` is being handled, which closes the
connection and the channel, and lets all goroutines finish.
---
v2: doesn't panic on server shutdown
It took me a while to untangle the complexity of multiple clients being
connected to multiple networks, etc. I think this fix is much closer to
the root cause, but there is still some uncertainty on my part. Feedback
would be much appreciated.
user.go | 1 +
1 file changed, 1 insertion(+)
diff --git a/user.go b/user.go
index c3a3931..dcaf968 100644
--- a/user.go+++ b/user.go
@@ -791,6 +791,7 @@ func (u *user) run() {
})
u.bumpDownstreamInteractionTime(ctx)
+ dc.Close() case eventDownstreamMessage:
msg, dc := e.msg, e.dc
if dc.isClosed() {
--
2.45.2
I actually think your first patch was a good fix: it ensures that the
connection is properly closed in all cases.
The panic comes from a footgun I'm responsible for. The *conn pointer
returned by newConn is dereferenced in newDownstreamConn and
connectToUpstream (a premature optimization). As a result, the whole
struct is copied and the internal newConn goroutine works with a
different chunk of memory than downstreamConn and upstreamConn.
I've fixed that in 94ff7d49b116 ("Stop dereferencing *conn").
On 7/8/24 8:35 PM, Simon Ser wrote:
> I actually think your first patch was a good fix: it ensures that the> connection is properly closed in all cases.> > The panic comes from a footgun I'm responsible for. The *conn pointer> returned by newConn is dereferenced in newDownstreamConn and> connectToUpstream (a premature optimization). As a result, the whole> struct is copied and the internal newConn goroutine works with a> different chunk of memory than downstreamConn and upstreamConn.> > I've fixed that in 94ff7d49b116 ("Stop dereferencing *conn").
Cool, thanks. That would have definitely taking me a long time to figure
out :)
Does the first patch still apply? Or do you want me to open a PR on
codeberg or something?
Cheers,
Conrad
On Monday, July 8th, 2024 at 22:36, Conrad Hoffmann <ch@bitfehler.net> wrote:
> Does the first patch still apply? Or do you want me to open a PR on> codeberg or something?
Yup, I just wanted to hear back from you to make sure it all makes
sense. Pushed now, thanks!