~sircmpwn/aerc

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
5 2

[PATCH] maildir: fix race condition in openDirectory

Details
Message ID
<20210409193732.9321-1-w@104d.net>
DKIM signature
pass
Download raw message
Patch: +9 -1
Signed-off-by: wagner riffel <w@104d.net>
---
This race condition happens often while aerc is openning a maildir
directory, deleteing this second PostMessage calls also works, but
right above there is a comment it has to be sent twice for some
unknown reason, so i left the second send but using a copy of info
instead.
 worker/maildir/worker.go | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/worker/maildir/worker.go b/worker/maildir/worker.go
index 4a7ae51..094856c 100644
--- a/worker/maildir/worker.go
+++ b/worker/maildir/worker.go
@@ -304,7 +304,15 @@ func (w *Worker) handleOpenDirectory(msg *types.OpenDirectory) error {
		Info: w.getDirectoryInfo(msg.Directory),
	}
	w.worker.PostMessage(info, nil)
	w.worker.PostMessage(info, nil)

	// NOTE: this is a copy of info so we don't post the same *Message twice
	// into the queue causing a race conditions within worker.ProcessMessage
	// (called from main goroutine) and worker.PostMessage (called from
	// worker.Run() goroutine)
	info2 := &types.DirectoryInfo{
		Info: info.Info,
	}
	w.worker.PostMessage(info2, nil)
	return nil
}

-- 
2.30.2
Details
Message ID
<20210412165438.rgyub5odplehdxzn@feather.localdomain>
In-Reply-To
<20210409193732.9321-1-w@104d.net> (view parent)
DKIM signature
pass
Download raw message
Hi,

On Fri, Apr 09, 2021 at 04:37:32PM -0300, wagner riffel wrote:
> This race condition happens often while aerc is openning a maildir
> directory, deleteing this second PostMessage calls also works, but
> right above there is a comment it has to be sent twice for some
> unknown reason, so i left the second send but using a copy of info
> instead.

Nack, this is a band aid glossing over a problem.
Not sure why it's not enough to only spawn a single event (notmuch has the same
hack in place)

But we certainly shouldn't make a copy of the thing for no reason.
We need to fix the underlying problem instead.

Do you want to have a look? I'd appreciate it.

Kind regards,
Reto
Details
Message ID
<CAMOM8E3OOV3.1IXDC6EBQVNXZ@pampas>
In-Reply-To
<20210412165438.rgyub5odplehdxzn@feather.localdomain> (view parent)
DKIM signature
pass
Download raw message
Patch: +10 -2
On Mon Apr 12, 2021 at 1:54 PM -03, Reto wrote:
> Do you want to have a look? I'd appreciate it.
>

Here's what I could dig when I was investigating this:

We send the first message here from the worker goroutine:
https://git.sr.ht/~sircmpwn/aerc/tree/a5553438/item/worker/maildir/worker.go#L306

Then Tick() is waked in the main goroutine and calls ProcessMessage:
https://git.sr.ht/~sircmpwn/aerc/tree/a5553438/item/widgets/account.go#L100

ProcessMessage in the main goroutine reads types.Message state with
msg.getId() and msg.InResponseTo():
https://git.sr.ht/~sircmpwn/aerc/tree/a5553438/item/worker/types/worker.go#L74-76

Meanwhile in the worker goroutine we call PostMessage for a second
time with a pointer that points to the *same* previous message that
ProcessMessage is reading:
https://git.sr.ht/~sircmpwn/aerc/tree/a5553438/item/worker/maildir/worker.go#L306

The second PostMessage call makes writes to message while
ProcessMessage in the main goroutine is possibly reading:
https://git.sr.ht/~sircmpwn/aerc/tree/a5553438/item/worker/types/worker.go#L59

I didn't know it's was happening with other workers, I made a few
tests with imap and never saw such race condition, so to preserve the
same performance/behaviour across workers I thought a little copying
in maildir would be an okay fix but apparently the correct fix is to
make types.Message thread-safe, which also fixes the problem but does
locking for all workers, even those that doesn't cause this race to
happen (such as imap), how do you feel about this diff? If you're ok
with this one, I can send a proper git formatted patch

diff --git a/worker/types/messages.go b/worker/types/messages.go
index ab0e545..6858b2e 100644
--- a/worker/types/messages.go
+++ b/worker/types/messages.go
@@ -2,6 +2,7 @@ package types

import (
	"io"
	"sync"
	"time"

	"git.sr.ht/~sircmpwn/aerc/config"
@@ -15,6 +16,7 @@ type WorkerMessage interface {
}

type Message struct {
	mu           sync.Mutex
	inResponseTo WorkerMessage
	id           int64
}
@@ -25,15 +27,21 @@ func RespondTo(msg WorkerMessage) Message {
	}
}

func (m Message) InResponseTo() WorkerMessage {
func (m *Message) InResponseTo() WorkerMessage {
	m.mu.Lock()
	defer m.mu.Unlock()
	return m.inResponseTo
}

func (m Message) getId() int64 {
func (m *Message) getId() int64 {
	m.mu.Lock()
	defer m.mu.Unlock()
	return m.id
}

func (m *Message) setId(id int64) {
	m.mu.Lock()
	defer m.mu.Unlock()
	m.id = id
}

ps: In attachments there's a stack trace from the Go race detector.

-wagner
Details
Message ID
<CAXCRMLDE1JN.1G7YF6RQ81QYL@pampas>
In-Reply-To
<CAMOM8E3OOV3.1IXDC6EBQVNXZ@pampas> (view parent)
DKIM signature
pass
Download raw message
Hey Reto, friendly pinging, in case you missed my reply, it's been
over 15 days since your last reply and I'm willing to work on changes
to get this merged.

Best Regards

--wagner
Details
Message ID
<20210426193459.pw2ozgya7phlzdxs@feather.localdomain>
In-Reply-To
<CAXCRMLDE1JN.1G7YF6RQ81QYL@pampas> (view parent)
DKIM signature
pass
Download raw message
Hallo Wagner,

On Mon, Apr 26, 2021 at 12:58:47AM -0300, wagner riffel wrote:
> Hey Reto, friendly pinging, in case you missed my reply, it's been
> over 15 days since your last reply and I'm willing to work on changes
> to get this merged.

Sorry, fell through the cracks.

So...
I can't reproduce the error anymore with the event loop stalling at least in
the notmuch worker.

Can you try what happens if you simply remove the double emit of the event?
It's clearly the wrong thing to do anyhow.

Kind regards,
Reto
Details
Message ID
<CAXXZEL2B4E0.25FVFUF9QRDLK@pampas>
In-Reply-To
<20210426193459.pw2ozgya7phlzdxs@feather.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Mon Apr 26, 2021 at 4:34 PM -03, Reto wrote:
> I can't reproduce the error anymore with the event loop stalling at
> least in
> the notmuch worker.
>
> Can you try what happens if you simply remove the double emit of the
> event?
> It's clearly the wrong thing to do anyhow.
>

Hi Reto, I think I mentioned this on the patch email, I did test and
actually I'm using for some weeks this exact approach, I just deleted
the second emit, I see no apparent problems so far.

Best Regards.

--wagner
Reply to thread Export thread (mbox)