~pukkamustard/eris

6 2

Proposal to change key generation of blocks

Details
Message ID
<1643726620.9gqm4oi05w.astroid@fuji.none>
DKIM signature
missing
Download raw message
Hello list

I have a proposal to consider for the erisx3 encoding standard. Before
noticing the problem with decoding trees at the incorrect depth our reviewer
raised an issue that URNs are un-verifiable. Corruption of URNs during
copy-pasting or by an attacker is undetectable without concluding that a
block reference cannot be retrieved. I believe there is a fix that would
solve this problem.

In the current standard there are two essentially different hash functions:
- BLAKE2b with an 128-bit output size and a 128-bit salt. The output of this
  function is a block *key*. Anyone with the key can encrypt or decrypt the
  block, but the key can only be generated by having both the block content
  and the salt. Keys cannot be reliably verify a block, because there is
  nothing to indicate if a key was generated with a zero-salt or a random
  salt (or if a key is a random value rather than a hash digest).
 
- BLAKE2b with a 128-bit output size and no salt. This is used to generate
  a *reference* for each block. This reference is used to retrieve and
  verify an encrypted block.

The current proposed breaking change for erisx3 is to use the height of a
block as the nonce for the Chacha20 keystream. This means that tree leaves,
the content blocks at height 0, have the same key and reference as erisx2.
The intermediate or branch blocks within the tree are now encoded
differently, and decrypting a block at an incorrect level does not reveal the
original plaintext block. This prevents a leaf block from being misused as a
branch block and the leaf plaintext being used as key references but
failures are still undectable.

The proposal is to use an unsalted hash function (standard BLAKE2b-128) to
generate both the keys and references of branch blocks and to continue to
use a salted function for leaf keys and an unstalted function for leaf
references. With this change an implementation may retrieve and decrypt the
root block of a URN, compare the key with with hash of plaintext, and infer
from this that each bit of the URN is correct:
- If the URN reference is corrupted then the correct block cannot be
  retrieved.
- If the URN key is corrupted the block will not decrypt correctly and fail
  to verify against the key.
- If the URN tree-height byte is corrupted then again the block will not
  decrypt and key verification fails.
- If the block size byte is corrupt then a block might be retrieved but it
  will not verify against the block reference for the incorrect block size.

At each tree level above the leaf blocks could be verified for both the key
and reference, but I would recommend skipping key verification for anything
below the root node.

This requires a small amount of additional code to implement, but is trivial
to add along with the tree-depth nonce changes.

Cheers,
Emery
Details
Message ID
<86o80plgh8.fsf@posteo.net>
In-Reply-To
<1643726620.9gqm4oi05w.astroid@fuji.none> (view parent)
DKIM signature
missing
Download raw message
Hey Emery,

Sorry for the delayed response.

All in all, I think it is a reasonable proposal. I will first recap the
proposed change with context and then state some questions and thoughts I
have.

ERIS uses the hash of some content as the key to encrypt the
content. This is called convergent encryption. Convergent encryption is
deterministic and allows deterministic identifiers for content. However,
convergent encryption suffers from two known attacks [1]. To mitigate
against these attacks ERIS allows the usage of convergence
secrets.

The convergence secret can be used to add entropy to the encoding
process in a very controlled way. This is implemented by using the keyed
hashing functionality of Blake2b. I use the terminology "keyed" instead
of "salted" as the original Blake2 specification allows both as two
distinct things. The IETF Blake2 does not define the "salted" variant,
only the keyed [2].

Instead of just using the hash of some content as the key to encrypt the
content (regular convergent encryption) we use the hash keyed with the
convergence secret as key to encrypt the content:

ENCRYPTION_KEY := Blake2b(CONTENT, key = CONVERGENCE_SECRET)
ENCRYPTED := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)

Currently this is done for all blocks (leaf content blocks and
intermediary node blocks) - the convergence secret is used at every
level of the tree.

What you propose is to only use the convergence secret at level 0 - at
leaf content blocks:

ENCRYPTION_KEY := Blake2b(CONTENT, key = CONVERGENCE_SECRET)
ENCRYPTED_CONTENT_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)

And for intermediary node blocks the unkeyed Blake2b function is used
(convergence secret is not used):

ENCRYPTION_KEY := Blake2b(CONTENT)
ENCRYPTED_NODE_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)

Did I get it right?

Emery Hemingway <ehmry@posteo.net> writes:

[..]

> This requires a small amount of additional code to implement, but is trivial
> to add along with the tree-depth nonce changes.

Agree.

> - If the URN key is corrupted the block will not decrypt correctly and fail
>   to verify against the key.

Agree. And I believe this is the crux of the proposal. In pseudo-code:

ENCRYPTED-BLOCK := Block-Storage-Get(REFERENCE)
DECRYPTED := ChaCha20(ENCRYPTED-BLOCK, KEY, NONCE)

assert KEY = Blake2b(DECRYPTED)

> At each tree level above the leaf blocks could be verified for both the key
> and reference, but I would recommend skipping key verification for anything
> below the root node.

Agree. This should be seen as a mechanism for verifying integrity of the
read-capability/URN. If that is ok we can be sure that the tree is ok.

The only problem I see with the proposal is that it only works for
content that is encoded to more than a single level. If the root level
in the read-capability is 0 then the convergence secret is used to
compute the key of the single content block. During decoding the key can
not be verified as the convergence secret is not known.

I find this a bit unsatisfactory. The advantages offered by the proposal
only work in certain cases. But maybe that's ok?

> - BLAKE2b with an 128-bit output size and a 128-bit salt. The output of this

Nitpick: We use Blake2b with an 256-bit output size (32 byte) and 256-bit key.

All in all, I think the proposal is good and am in favor of adopting the
change. Maybe we can reflect if there is an alternative that would also
verify the key if content is encoded in a single block.

Would adding a CRC32 check value to the read-capability be an
alternative? Or using something like Base32Check? Both seems a lot more
involved than what you propose...

-pukkamustard


[1] http://purl.org/eris#name-convergence-secret
[2] https://datatracker.ietf.org/doc/html/rfc7693#section-2.5
[3] https://lists.sr.ht/~pukkamustard/eris/%3C20220125202715.82bd135fc8bdbec4b2703a39%40posteo.net%3E
[4] https://base32check.org/
Details
Message ID
<1651099118.qdhxufi50i.astroid@zuni.none>
In-Reply-To
<86o80plgh8.fsf@posteo.net> (view parent)
DKIM signature
missing
Download raw message
Excerpts from pukkamustard's message of April 25, 2022 2:14 am:

> What you propose is to only use the convergence secret at level 0 - at
> leaf content blocks:
> 
> ENCRYPTION_KEY := Blake2b(CONTENT, key = CONVERGENCE_SECRET)
> ENCRYPTED_CONTENT_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)
> 
> And for intermediary node blocks the unkeyed Blake2b function is used
> (convergence secret is not used):
> 
> ENCRYPTION_KEY := Blake2b(CONTENT)
> ENCRYPTED_NODE_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)
> 
> Did I get it right?

That is correct.


> ... And I believe this is the crux of the proposal. In pseudo-code:
> 
> ENCRYPTED-BLOCK := Block-Storage-Get(REFERENCE)
> DECRYPTED := ChaCha20(ENCRYPTED-BLOCK, KEY, NONCE)
> 
> assert KEY = Blake2b(DECRYPTED)

Also correct (for "node" blocks).

I recently looked at the internals of BLAKE2b and keying the hash adds
and additional compression round, for 1KiB there are 8 rounds, 256 for
32KiB. It's minor but any optimization in the spec is magnified in
practice.

> The only problem I see with the proposal is that it only works for
> content that is encoded to more than a single level. If the root level
> in the read-capability is 0 then the convergence secret is used to
> compute the key of the single content block. During decoding the key can
> not be verified as the convergence secret is not known.
> 
> I find this a bit unsatisfactory. The advantages offered by the proposal
> only work in certain cases. But maybe that's ok?
> 
> ...
> 
> Would adding a CRC32 check value to the read-capability be an
> alternative? Or using something like Base32Check? Both seems a lot more
> involved than what you propose...


I think a checksum is worthwhile, this was raised in the security audit
but we haven't addressed it. Corrupting a URN would be a denial-of-service
attack because it could result in a search for content that will never
exist.

I looked into Base32Check and I think the overhead is acceptable. It would
only be present when capabilities are rendered to URNs and increases the
size of the URN from 117 characters to 118. I've tried the checksum with
the Nim implementation[1] and the computation cost looks negligible.

The R in ERIS stands for robust, so I am in favor of adding a Base32check
checksum. Thanks for finding that algorithm.

- Emery

[1] https://base32check.org/
[2] https://codeberg.org/eris/nim-eris/commit/1a51f591244
Details
Message ID
<86r15hn26k.fsf@posteo.net>
In-Reply-To
<1651099118.qdhxufi50i.astroid@zuni.none> (view parent)
DKIM signature
missing
Download raw message
Emery Hemingway <ehmry@posteo.net> writes:

[..]

> Also correct (for "node" blocks).
>
> I recently looked at the internals of BLAKE2b and keying the hash adds
> and additional compression round, for 1KiB there are 8 rounds, 256 for
> 32KiB. It's minor but any optimization in the spec is magnified in
> practice.

Another reason to adopt the change.

If there are no objections by anybody else, I will make the changes to
the spec as proposed by Emery.

-pukkamustard

Base32Check1 (was Re: Proposal to change key generation of blocks)

Details
Message ID
<86mtg5n1cv.fsf@posteo.net>
In-Reply-To
<1651099118.qdhxufi50i.astroid@zuni.none> (view parent)
DKIM signature
missing
Download raw message
Emery Hemingway <ehmry@posteo.net> writes:

[..]

>> Would adding a CRC32 check value to the read-capability be an
>> alternative? Or using something like Base32Check? Both seems a lot more
>> involved than what you propose...
>
>
> I think a checksum is worthwhile, this was raised in the security audit
> but we haven't addressed it. Corrupting a URN would be a denial-of-service
> attack because it could result in a search for content that will never
> exist.
>
> I looked into Base32Check and I think the overhead is acceptable. It would
> only be present when capabilities are rendered to URNs and increases the
> size of the URN from 117 characters to 118. I've tried the checksum with
> the Nim implementation[1] and the computation cost looks negligible.

Nice. I also think it is a good idea (and explicitly mentioned in the
security audit).

I have one objection to using Base32Check1: There is no formal and
concise specification (i.e. no IETF RFC) and that there are not so many
implementations (compared to plain Base32). This forces implementors of
ERIS to also implement Base32Check1.

From looking at the implementations it seems quite doable. But I think
we should at least provide a pseudocode implementation so that ERIS
implementors don't need to go and read the paper [1], blog post [2] and
implmentation optimizations [3] but can just re-implement the
pseudocode. We would reference the papers/posts as informative
references. Maybe we should also add the test cases for base32check1 in
the appendix? Wdyt?

[1] https://www.uni-due.de/imperia/md/content/dc/yanling_2015_check_digit.pdf
[2] https://espadrine.github.io/blog/posts/a-base32-checksum.html
[3] https://johnkerl.org/doc/ffcomp.pdf

> The R in ERIS stands for robust, so I am in favor of adding a Base32check
> checksum. Thanks for finding that algorithm.

Thanks goes to Tony (in CC) for pointing me to the algorithm.

-pukkamustard
Details
Message ID
<86h76487g8.fsf@posteo.net>
In-Reply-To
<86r15hn26k.fsf@posteo.net> (view parent)
DKIM signature
missing
Download raw message
pukkamustard <pukkamustard@posteo.net> writes:

> If there are no objections by anybody else, I will make the changes to
> the spec as proposed by Emery.

Pushed changes as bf894cf12d1e28a3c8ab6f76cc3e2e4ac8fae2f7.

Rendered document and updated test-vectors are available:
http://purl.org/eris

Happy hacking,
-pukkamustard
Details
Message ID
<86zgju7s9q.fsf@posteo.net>
In-Reply-To
<1651099118.qdhxufi50i.astroid@zuni.none> (view parent)
DKIM signature
missing
Download raw message
Hi,

I managed to implement Base32Check1 as well:

https://codeberg.org/eris/guile-eris/commit/5da08fe09c89b98a0f2486957a9757c0789013b5

I'm not sure I'm doing it right and don't quite grok the Algebra.

Some thoughts:

- Implementation effort is manageable and I think we could provide
  pseudocode that would make it easy to implement (e.g. by providing
  precomputed primitive powers).

- Implementation requires close interaction with the Base32 encoding
  itself. If you're using an existing Base32 library you probably will
  have to reimplement it to be able to implement Base32Check1.

- Adding this does not add anything in terms of security. A malicious
  actor can manipulate a URN so that the checksum still
  works. Base32Check1 only protects from typos and human
  tranmission/transcription errors.

- It's really underspecified. I ended up following your Nim
  implementation and the Python implementation [1] while trying to
  understand what's going on by reading up on computation in finite
  fields [2].

- I don't understand why the base32check1.org people chose a different
  primitive polynomial [3] than this blog post [4]. Is there any
  advantage? If not why break compatibility with the Javascript
  implementation that comes along with the blog post? If this is ever
  more formally specified would they use a different polynomial?

Base32Check1 definetely adds a bit of robustness but imho at a too
high cost. The implementation effort is considerably higher and we rely
on something that is not formally specified. As there are no security
advantages I feel like using Base32Check1 is not justified.

Thoughts?

-pukkamustard


[1] https://github.com/bitmarck-service/base32check-python/blob/master/base32check1.py
[2] https://johnkerl.org/doc/ffcomp.pdf
[3] https://base32check.org/introduction.html#base32check1
[4] https://espadrine.github.io/blog/posts/a-base32-checksum.html
[5] http://purl.org/eris#section-1.1-2.16



Emery Hemingway <ehmry@posteo.net> writes:

> [[PGP Signed Part:Undecided]]
> Excerpts from pukkamustard's message of April 25, 2022 2:14 am:
>
>> What you propose is to only use the convergence secret at level 0 - at
>> leaf content blocks:
>> 
>> ENCRYPTION_KEY := Blake2b(CONTENT, key = CONVERGENCE_SECRET)
>> ENCRYPTED_CONTENT_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)
>> 
>> And for intermediary node blocks the unkeyed Blake2b function is used
>> (convergence secret is not used):
>> 
>> ENCRYPTION_KEY := Blake2b(CONTENT)
>> ENCRYPTED_NODE_BLOCK := ChaCha20(CONTENT, ENCRYPTION_KEY, NONCE)
>> 
>> Did I get it right?
>
> That is correct.
>
>
>> ... And I believe this is the crux of the proposal. In pseudo-code:
>> 
>> ENCRYPTED-BLOCK := Block-Storage-Get(REFERENCE)
>> DECRYPTED := ChaCha20(ENCRYPTED-BLOCK, KEY, NONCE)
>> 
>> assert KEY = Blake2b(DECRYPTED)
>
> Also correct (for "node" blocks).
>
> I recently looked at the internals of BLAKE2b and keying the hash adds
> and additional compression round, for 1KiB there are 8 rounds, 256 for
> 32KiB. It's minor but any optimization in the spec is magnified in
> practice.
>
>> The only problem I see with the proposal is that it only works for
>> content that is encoded to more than a single level. If the root level
>> in the read-capability is 0 then the convergence secret is used to
>> compute the key of the single content block. During decoding the key can
>> not be verified as the convergence secret is not known.
>> 
>> I find this a bit unsatisfactory. The advantages offered by the proposal
>> only work in certain cases. But maybe that's ok?
>> 
>> ...
>> 
>> Would adding a CRC32 check value to the read-capability be an
>> alternative? Or using something like Base32Check? Both seems a lot more
>> involved than what you propose...
>
>
> I think a checksum is worthwhile, this was raised in the security audit
> but we haven't addressed it. Corrupting a URN would be a denial-of-service
> attack because it could result in a search for content that will never
> exist.
>
> I looked into Base32Check and I think the overhead is acceptable. It would
> only be present when capabilities are rendered to URNs and increases the
> size of the URN from 117 characters to 118. I've tried the checksum with
> the Nim implementation[1] and the computation cost looks negligible.
>
> The R in ERIS stands for robust, so I am in favor of adding a Base32check
> checksum. Thanks for finding that algorithm.
>
> - Emery
>
> [1] https://base32check.org/
> [2] https://codeberg.org/eris/nim-eris/commit/1a51f591244
>
> [[End of PGP Signed Part]]
Reply to thread Export thread (mbox)