~pukkamustard/eris

2 2

Padding block aligned data

Details
Message ID
<32632d93-d2b0-411f-9a4d-aeee4ef7e70b@posteo.net>
DKIM signature
missing
Download raw message
As a systems programmer I am a little bothered by the use of padding
blocks for data that is aligned to an ERIS block size. In the worst
case, a 32KiB blob would be encoded using 96KiB, one block of data,
one block of padding, and one block of Merkle tree.

I propose that a blob of data with a length aligned to its encoding block
size should omit padding if the last byte of the blob is not '0x80'. If
the last byte of the last block is '0x80', then this final byte is
treated as padding. If the final byte of the blob is '0x80', then a block
with byte '0x80' followed by zeros is appended as padding (as usual).

Would this be sufficiently unambiguous?
Emery
Details
Message ID
<861r4sx0bi.fsf@posteo.net>
In-Reply-To
<32632d93-d2b0-411f-9a4d-aeee4ef7e70b@posteo.net> (view parent)
DKIM signature
missing
Download raw message
Hi,

Emery Hemingway <ehmry@posteo.net> writes:

> As a systems programmer I am a little bothered by the use of padding
> blocks for data that is aligned to an ERIS block size. In the worst
> case, a 32KiB blob would be encoded using 96KiB, one block of data,
> one block of padding, and one block of Merkle tree.

Yup, this is not so good.

Two considerations:

1. For any ERIS encoded 32KiB blob the padding block will be the
   same. When you have many such blobs the padding block becomes
   negligible and they would each be encoded using 64KiB and not 96KiB.
2. This affects data structures that are exactly 32KiB large. Structures
   smaller (e.g. 16KiB) will be padded and fit in a single 32KiB
   block. Structures larger (e.g. 64KiB) will anyway be split up into
   multiple blocks and the additional overhead of the padding block is
   negligible.

> I propose that a blob of data with a length aligned to its encoding block
> size should omit padding if the last byte of the blob is not '0x80'. If
> the last byte of the last block is '0x80', then this final byte is
> treated as padding. If the final byte of the blob is '0x80', then a block
> with byte '0x80' followed by zeros is appended as padding (as usual).
>
> Would this be sufficiently unambiguous?

The encoding/padding process sounds unambiguous to me. It does add a bit
of complexity. One (maybe unexpected) way this adds comlexity:

A naive decoding process that decodes the padding added as described
above might work as follows:

UNPAD2(INPUT, BLOCK_SIZE):
    IF Length(INPUT) == BLOCK_SIZE THEN
        IF INPUT[BLOCK_SIZE - 1] == 0x80 THEN
           UNPAD(INPUT, BLOCK_SIZE)
        ELSE
           RETURN INPUT
    ELSE
        UNPAD(INPUT, BLOCK_SIZE)

Where UNPAD is the unpadding function as currenlty specified in ERIS
v0.2.0.

This naive UNPAD2 would decode content of size exactly 32KiB encoded
with ERIS v0.2.0 and PAD2 to the same content. Even though they have
different identifiers.

I.e. the block:

XXXXXXXXXXXXXXXXX

and the blocks:

XXXXXXXXXXXXXXXXX 0x80|00000000000000

would decode to the same content eventhough they have different read
capabilities. This might be a security issue. A correct function
implementing the unpad function would need to make more checks and
reject the second example as invalid encoding.

Complexity is not always avoidable. For certain use-cases (e.g. small
pieces of data and meta-data) we have added considerable amount of
complexity to ERIS. I think adding complexity such as the suggested
padding needs to be motivated with use-cases.

What was the use-case you had in mind? And could the problem you
describe solved in other ways?

-pukkamustard
Details
Message ID
<03c4af4f-0cab-4c57-9afa-4238ccf97797@posteo.net>
In-Reply-To
<861r4sx0bi.fsf@posteo.net> (view parent)
DKIM signature
missing
Download raw message
On Sunday 10 October 2021 18:18:43 CEST, pukkamustard wrote:
> What was the use-case you had in mind? And could the problem you
> describe solved in other ways?

My use-case would be some application or system service that produces or
consumes data optimized for ERIS, such as file-system archives or
memory snapshots.

I think you are right that what I've suggested is ambiguous and
unnecessarily complex. What I want is to be create and access blocks
without any padding, which would be possible if an application has
access to a raw block interface. I think this would be reasonable in a
few cases, but these "raw" blocks would not be representable with URNs.

E.
Reply to thread Export thread (mbox)