Hi,
Thanks for m2dir I think it solves the issues I have with Maildir as well.
One thing that strikes me as very weird though is the unique-id,
not only it's way too short 2^12 = 4096, I think it's reasonable
to expect more emails than this.
And the collision solving method makes it unsuitable to non-centralised
usage, including folder manipulation done by the email client
itself or moving emails around locally (as the unique-id mustn't
conflict in .meta).
I think something like an UUID using time+random would make the most
sense and be virtually collision-free, as demonstrated by their usage
in distributed systems.
One example of such prior art: https://en.wikipedia.org/wiki/Snowflake_ID
Best regards
> One thing that strikes me as very weird though is the unique-id,> not only it's way too short 2^12 = 4096, I think it's reasonable> to expect more emails than this.
The hash is 12 bytes, not bits. 2^(8*12)≈8e28 which should be big enough.
Hey!
On 4/18/24 1:49 PM, Haelwenn (lanodan) Monnier wrote:
> One thing that strikes me as very weird though is the unique-id,> not only it's way too short 2^12 = 4096, I think it's reasonable> to expect more emails than this.
Reasonable indeed, but how did you arrive at that number? The unique ID
is the output of fnv64a, so this should provide roughly 64 bits of
entropy, and make collisions rather unlikely unless they are indeed
copies of the same message?
> And the collision solving method makes it unsuitable to non-centralised> usage, including folder manipulation done by the email client> itself or moving emails around locally (as the unique-id mustn't> conflict in .meta).
The collision resolution is indeed a very "last resort" sort of thing,
but I would expect it to a) be needed rather rarely and b) be needed
much less rarely to perform under such constraints (also c) in the last
paragraph).
I do understand that this does not meet requirements you'd want to
satisfy in distributed systems, but I think the use case of "moving
emails around locally" does not quite have that level of, I am lacking a
word here, distributedness either?
> I think something like an UUID using time+random would make the most> sense and be virtually collision-free, as demonstrated by their usage> in distributed systems.
The main focus of the approach is to produce filenames that can be
digested by a human being, so some trade-offs were made. One purpose of
the collision resolution is that it is very easy to spot when you have
multiple copies of the same message.
Even though it's (maybe?) sort of specified, I would not recommed using
m2dir as storage implementation of a high-volume SMTP server (Maildir is
fine for that). It is, first and foremost, for individuals to store
their emails. And, I don't know about you, but even as a
not-email-hater, I certainly hope I will never have to deal with
anywhere close to 2^64 emails ;)
Cheers,
Conrad
[2024-04-18 14:18:33+0200] Knut Magnus Aasrud:
>> One thing that strikes me as very weird though is the unique-id,>> not only it's way too short 2^12 = 4096, I think it's reasonable>> to expect more emails than this.>>The hash is 12 bytes, not bits. 2^(8*12)≈8e28 which should be big enough.
Erf, reminds me of why I tend to stick to octet.
And yeah 12 bytes should be enough.