~bitfehler/m2dir

3 3

Handling of collisions in unique-id

Details
Message ID
<ZiEI3nZ0mioH4Mg3@cloudsdale.the-delta.net.eu.org>
DKIM signature
pass
Download raw message
Hi,

Thanks for m2dir I think it solves the issues I have with Maildir as well.

One thing that strikes me as very weird though is the unique-id,
not only it's way too short 2^12 = 4096, I think it's reasonable
to expect more emails than this.

And the collision solving method makes it unsuitable to non-centralised
usage, including folder manipulation done by the email client
itself or moving emails around locally (as the unique-id mustn't
conflict in .meta).

I think something like an UUID using time+random would make the most
sense and be virtually collision-free, as demonstrated by their usage
in distributed systems.

One example of such prior art: https://en.wikipedia.org/wiki/Snowflake_ID

Best regards
Details
Message ID
<33592e9c-9717-44f1-b5bc-1a40fbbc4827@aasrud.com>
In-Reply-To
<ZiEI3nZ0mioH4Mg3@cloudsdale.the-delta.net.eu.org> (view parent)
DKIM signature
pass
Download raw message
> One thing that strikes me as very weird though is the unique-id,
> not only it's way too short 2^12 = 4096, I think it's reasonable
> to expect more emails than this.

The hash is 12 bytes, not bits. 2^(8*12)≈8e28 which should be big enough.
Details
Message ID
<aeb6a8b7-f795-4e41-a9f6-422549b2ad0a@bitfehler.net>
In-Reply-To
<ZiEI3nZ0mioH4Mg3@cloudsdale.the-delta.net.eu.org> (view parent)
DKIM signature
pass
Download raw message
Hey!

On 4/18/24 1:49 PM, Haelwenn (lanodan) Monnier wrote:
> One thing that strikes me as very weird though is the unique-id,
> not only it's way too short 2^12 = 4096, I think it's reasonable
> to expect more emails than this.

Reasonable indeed, but how did you arrive at that number? The unique ID 
is the output of fnv64a, so this should provide roughly 64 bits of 
entropy, and make collisions rather unlikely unless they are indeed 
copies of the same message?

> And the collision solving method makes it unsuitable to non-centralised
> usage, including folder manipulation done by the email client
> itself or moving emails around locally (as the unique-id mustn't
> conflict in .meta).

The collision resolution is indeed a very "last resort" sort of thing, 
but I would expect it to a) be needed rather rarely and b) be needed 
much less rarely to perform under such constraints (also c) in the last 
paragraph).

I do understand that this does not meet requirements you'd want to 
satisfy in distributed systems, but I think the use case of "moving 
emails around locally" does not quite have that level of, I am lacking a 
word here, distributedness either?

> I think something like an UUID using time+random would make the most
> sense and be virtually collision-free, as demonstrated by their usage
> in distributed systems.

The main focus of the approach is to produce filenames that can be 
digested by a human being, so some trade-offs were made. One purpose of 
the collision resolution is that it is very easy to spot when you have 
multiple copies of the same message.

Even though it's (maybe?) sort of specified, I would not recommed using 
m2dir as storage implementation of a high-volume SMTP server (Maildir is 
fine for that). It is, first and foremost, for individuals to store 
their emails. And, I don't know about you, but even as a 
not-email-hater, I certainly hope I will never have to deal with 
anywhere close to 2^64 emails ;)

Cheers,
Conrad
Details
Message ID
<ZiETaBby_8eWgJRr@cloudsdale.the-delta.net.eu.org>
In-Reply-To
<33592e9c-9717-44f1-b5bc-1a40fbbc4827@aasrud.com> (view parent)
DKIM signature
pass
Download raw message
[2024-04-18 14:18:33+0200] Knut Magnus Aasrud:
>> One thing that strikes me as very weird though is the unique-id,
>> not only it's way too short 2^12 = 4096, I think it's reasonable
>> to expect more emails than this.
>
>The hash is 12 bytes, not bits. 2^(8*12)≈8e28 which should be big enough.

Erf, reminds me of why I tend to stick to octet.

And yeah 12 bytes should be enough.
Reply to thread Export thread (mbox)