~rjarry/aerc-devel

maildir: fix data race in maildir worker v2 SUPERSEDED

wagner riffel: 1
 maildir: fix data race in maildir worker

 3 files changed, 91 insertions(+), 32 deletions(-)
I'm not sure, due the current Dir type of go-maildir (just string) it
will be needed to use at least one open(3)/close(3) per call of
Dir.Flag, I dropped this patch there, that reduced the loop run time
from 4.5s to 3.4s, still not as good as leaving the file descriptor
opened as I did for aerc.
Next
Yes, I tested this patched by calling both functions (Message.Flags
and Message.Readdir) and panicking if reflect.DeepEqual returned
false, also master aerc and with this patched reports the
same unseen messages count.

On Sat Mar 5, 2022 at 11:28 PM CET, inwit wrote:
Next
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~rjarry/aerc-devel/patches/30004/mbox | git am -3
Learn more about email & git

[PATCH v2] maildir: fix data race in maildir worker Export this patch

Fix a data race due dirInfo pointer being read in the main goroutine
by NewMessageStore and written in the annonymous goroutine launched in
Worker.getDirectoryInfo.

To address the issue https://todo.sr.ht/~rjarry/aerc/16 replace
go-maildir library Dir.Flags with our own based on readdir(3)
Signed-off-by: wagner riffel <w@104d.net>
---
On Mon Feb 28, 2022 at 12:29 AM CET, Robin Jarry wrote:
> This is almost a revert of a commit I made recently to fix a performance
> problem when opening maildir folders containing large number of
> messages:
>
> https://git.sr.ht/~rjarry/aerc/commit/622802d3a5ea980b2d08ff

Sorry, I shoud have checked git-blame first.

> I agree that there is likely a data race but could we find a fix that
> preserves the original fix? Maybe allow incremental dirinfo using
> a channel to keep the app responsive.

It's rare to see issues raised due this race because sleeps are
"synchronizing" the goroutine for the most messages.

While concurrency might be applied here, going through the issue I
found weird that modern hardware would have problems listing 40, 100,
400 thousands of emails, so I started profiling, no surprises on 86%
of the time aerc was hanging on syscalls, that happened to be
go-maildir library (made from here:
https://github.com/emersion/go-maildir/blob/v0.2.0/maildir.go#L178),
I've quickly drafted this patch which in my machine (i7 3770, over 10
years old) took down the loop run time for 400,000 random-generated
emails (~8.5 GB total of data) from ~4.5sec to ~610ms, honestly I
can't feel any difference in UI display time between current master
and this patch (I tried to replicate your benchmark described in
https://todo.sr.ht/~rjarry/aerc/16, "time ./aerc and quitting when the
message view was completely loaded" but I failed find where exactly to
place _exit 🤦), do you think it's still worth adding incremental
dirinfo progress concurrently?
PS: I'm not familiarized with maildir specification, so double check
if the proposed flag parsing is correct.
 worker/maildir/container.go | 17 +++++++++--
 worker/maildir/message.go   | 49 +++++++++++++++++++++++++++++++
 worker/maildir/worker.go    | 57 ++++++++++++++++++-------------------
 3 files changed, 91 insertions(+), 32 deletions(-)

diff --git a/worker/maildir/container.go b/worker/maildir/container.go
index 6f7d74a..aa48c1b 100644
--- a/worker/maildir/container.go
+++ b/worker/maildir/container.go
@@ -19,6 +19,7 @@ type Container struct {
	log        *log.Logger
	uids       *uidstore.Store
	recentUIDS map[uint32]struct{} // used to set the recent flag
	dirFile    *os.File
}

// NewContainer creates a new container at the specified directory
@@ -27,16 +28,26 @@ func NewContainer(dir string, l *log.Logger) (*Container, error) {
	if err != nil {
		return nil, err
	}
	defer f.Close()
	s, err := f.Stat()
	if err != nil {
		f.Close()
		return nil, err
	}
	if !s.IsDir() {
		f.Close()
		return nil, fmt.Errorf("Given maildir '%s' not a directory", dir)
	}
	return &Container{dir: dir, uids: uidstore.NewStore(), log: l,
		recentUIDS: make(map[uint32]struct{})}, nil
	return &Container{
		dir:        dir,
		uids:       uidstore.NewStore(),
		log:        l,
		recentUIDS: make(map[uint32]struct{}),
		dirFile:    f,
	}, nil
}

func (c *Container) Close() error {
	return c.dirFile.Close()
}

// ListFolders returns a list of maildir folders in the container
diff --git a/worker/maildir/message.go b/worker/maildir/message.go
index b73e083..93417c9 100644
--- a/worker/maildir/message.go
+++ b/worker/maildir/message.go
@@ -3,6 +3,9 @@ package maildir
import (
	"fmt"
	"io"
	"os"
	"sort"
	"strings"

	"github.com/emersion/go-maildir"
	"github.com/emersion/go-message"
@@ -23,6 +26,52 @@ func (m Message) NewReader() (io.ReadCloser, error) {
	return m.dir.Open(m.key)
}

func (m Message) info(dir *os.File) (info string, err error) {
	files, err := dir.Readdirnames(-1)
	if err != nil {
		return "", err
	}
	for _, fileName := range files {
		if fileName == m.key {
			return fileName, nil
		}
		i := strings.LastIndexByte(fileName, ':')
		if i < 0 {
			continue
		}
		if m.key == fileName[:i] {
			return fileName[i+1:], nil
		}
	}
	return "", &maildir.MailfileError{Name: m.key}
}

type byFlag []maildir.Flag

func (f byFlag) Len() int           { return len(f) }
func (f byFlag) Swap(i, j int)      { f[i], f[j] = f[j], f[i] }
func (f byFlag) Less(i, j int) bool { return f[i] < f[j] }

func (m Message) FlagsReaddir(dir *os.File) ([]maildir.Flag, error) {
	info, err := m.info(dir)
	if err != nil {
		return nil, err
	}
	if len(info) < 2 {
		return nil, &maildir.FlagError{Info: info, Experimental: false}
	}
	if info[1] != ',' || info[0] != '2' {
		return nil, &maildir.FlagError{Info: info, Experimental: false}
	}

	if info[0] == '1' {
		return nil, &maildir.FlagError{Info: info, Experimental: true}
	}
	s := byFlag(info[2:])
	sort.Sort(s)
	return []maildir.Flag(s), nil
}

// Flags fetches the set of flags currently applied to the message.
func (m Message) Flags() ([]maildir.Flag, error) {
	return m.dir.Flags(m.key)
diff --git a/worker/maildir/worker.go b/worker/maildir/worker.go
index 6ff66b2..fabd4a8 100644
--- a/worker/maildir/worker.go
+++ b/worker/maildir/worker.go
@@ -144,37 +144,33 @@ func (w *Worker) getDirectoryInfo(name string) *models.DirectoryInfo {
	}

	dirInfo.Exists = len(uids)

	go func() {
		info := dirInfo
		for _, uid := range uids {
			message, err := w.c.Message(dir, uid)
			if err != nil {
				w.worker.Logger.Printf("could not get message: %v", err)
				continue
			}
			flags, err := message.Flags()
			if err != nil {
				w.worker.Logger.Printf("could not get flags: %v", err)
				continue
			}
			seen := false
			for _, flag := range flags {
				if flag == maildir.FlagSeen {
					seen = true
				}
			}
			if !seen {
				info.Unseen++
			}
			if w.c.IsRecent(uid) {
				info.Recent++
	for _, uid := range uids {
		message, err := w.c.Message(dir, uid)
		if err != nil {
			w.worker.Logger.Printf("could not get message: %v", err)
			continue
		}
		flags, err := message.FlagsReaddir(w.c.dirFile)
		if err != nil {
			w.worker.Logger.Printf("could not get flags: %v", err)
			continue
		}
		seen := false
		for _, flag := range flags {
			if flag == maildir.FlagSeen {
				seen = true
			}
		}
		info.Unseen += info.Recent
		info.Exists += info.Recent
		info.AccurateCounts = true
	}()
		if !seen {
			dirInfo.Unseen++
		}
		if w.c.IsRecent(uid) {
			dirInfo.Recent++
		}
	}
	dirInfo.Unseen += dirInfo.Recent
	dirInfo.Exists += dirInfo.Recent
	dirInfo.AccurateCounts = true

	return dirInfo
}
@@ -241,6 +237,9 @@ func (w *Worker) handleConfigure(msg *types.Configure) error {
		w.worker.Logger.Printf("could not configure maildir: %s", dir)
		return err
	}
	if w.c != nil {
		w.c.Close()
	}
	w.c = c
	w.worker.Logger.Printf("configured base maildir: %s", dir)
	return nil
--
2.34.1
wagner riffel, Mar 05, 2022 at 05:26:
I'm having trouble with this patch. After applying it, I'm getting a ton of
error messages of the form: "could not get flags: maildir: invalid mailfile
format". I've counted and I'm getting one message per each file found in my
Maildir folder structure. The messages appear at startup, even before I try to
load other folders.

The overall speed of the Maildir interface is more or less the same. I'm seeing
a ~20K message folder load in around 20s (with and without the patch), which is
no good.

The same set of messages is processed by notmuch or mutt without complaints.

It could be unrelated, but the fact that before this patch I was getting other
errors like those I reported to maildir.go [0], makes me think that there could
be something related to how maildir.go deals with messages in the first place.

It could also have to do with the fact that (unfortunately) my account is a M$
Office365 account.

Thanks for the patch, in any case.


[0] https://github.com/emersion/go-maildir/issues/12