~rjarry/dlrepo

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
4 3

[PATCH dlrepo 0/2] fix orphaned hardlinks

Details
Message ID
<20231219140703.3140968-1-julien.floret@6wind.com>
DKIM signature
missing
Download raw message
This small series is for fixing potential orphan hardlinks when the
same file is uploaded several times into different places.
Patch 1 is a preliminary code cleanup without functional changes.
Patch 2 actually fixes the bug by adding a check before moving the
uploaded file into the .blobs folder.

Julien Floret (2):
  fs: rm return value for finalize_upload
  fs: ensure upload never replaces an existing blob

 dlrepo/fs/__init__.py | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

-- 
2.39.2

[PATCH dlrepo 1/2] fs: rm return value for finalize_upload

Details
Message ID
<20231219140703.3140968-2-julien.floret@6wind.com>
In-Reply-To
<20231219140703.3140968-1-julien.floret@6wind.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +5 -8
It is not used. Thus the return value of the _check_and_move() can be
removed too.
No functional change.

Signed-off-by: Julien Floret <julien.floret@6wind.com>
Acked-by: Thomas Faivre <thomas.faivre@6wind.com>
---
 dlrepo/fs/__init__.py | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/dlrepo/fs/__init__.py b/dlrepo/fs/__init__.py
index 9d98b124349d..00c68f00f92f 100644
--- a/dlrepo/fs/__init__.py
+++ b/dlrepo/fs/__init__.py
@@ -73,7 +73,7 @@ class AbstractRepository:
    def cancel_upload(self, uuid: str):
        raise NotImplementedError()

    async def finalize_upload(self, uuid: str, digest: str) -> Path:
    async def finalize_upload(self, uuid: str, digest: str):
        raise NotImplementedError()

    def link_blob(self, digest: str, target: Path):
@@ -265,10 +265,8 @@ class ArtifactRepository(AbstractRepository):
    def cancel_upload(self, uuid: str):
        self.upload_path(uuid).unlink()

    async def finalize_upload(self, uuid: str, digest: str) -> Path:
        return await _check_and_move(
            self.upload_path(uuid), self.blob_path(digest), digest
        )
    async def finalize_upload(self, uuid: str, digest: str):
        await _check_and_move(self.upload_path(uuid), self.blob_path(digest), digest)

    def link_blob(self, digest: str, target: Path):
        _copy_or_link(self.blob_path(digest), target)
@@ -371,8 +369,8 @@ class UserRepository(AbstractRepository):
    ) -> int:
        return await self.base.update_upload(uuid, stream)

    async def finalize_upload(self, uuid: str, digest: str) -> Path:
        return await self.base.finalize_upload(uuid, digest)
    async def finalize_upload(self, uuid: str, digest: str):
        await self.base.finalize_upload(uuid, digest)

    def blob_path(self, digest: str, parent: Optional[Path] = None) -> Path:
        return self.base.blob_path(digest, parent=parent)
@@ -444,4 +442,3 @@ async def _check_and_move(src: Path, dst: Path, digest: str):
    dst.parent.mkdir(mode=0o755, parents=True, exist_ok=True)
    src.rename(dst)
    dst.chmod(0o644)
    return dst
-- 
2.39.2

[PATCH dlrepo 2/2] fs: ensure upload never replaces an existing blob

Details
Message ID
<20231219140703.3140968-3-julien.floret@6wind.com>
In-Reply-To
<20231219140703.3140968-1-julien.floret@6wind.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +7 -3
When a file is uploaded, it is first uploaded to .uploads/<uuid>, then
moved to .blobs/<digest>.
The .blobs/<digest> file is then hardlinked to the actual file path
into the job.
If however the same file is uploaded more than once, without checking
that its digest already exists on the server, then any existing
.blobs/<digest> file is replaced with the same newly uploaded
file. As a consequence, the existing hardlinks become orphaned.

What's more, if the most recent job is deleted afterwards, the
associated .blobs/<digest> file will have no other hardlink to it, so
it will be deleted at the next run of the cleanup task.
This is catastrophic for containers, because it breaks container
pull. Indeed "docker pull", for example, downloads its layers
from .blobs/<digest>. If the blob has been cleaned up, the pull
command will fail with the "unknown blob" error message, even if the
missing blob can be seen under the job's container format.

The problem can be easily reproduced by running multiple parallel
"dlrepo-cli upload" of the same big file into different jobs.

To avoid this, let's check if the blob already exists before moving
the uploaded file into it. If yes, just keep the existing blob and
remove the newly uploaded one.

Fixes: bd1c23893882 ("server: add filesystem api")
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Acked-by: Thomas Faivre <thomas.faivre@6wind.com>
---
 dlrepo/fs/__init__.py | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/dlrepo/fs/__init__.py b/dlrepo/fs/__init__.py
index 00c68f00f92f..789eff455055 100644
--- a/dlrepo/fs/__init__.py
+++ b/dlrepo/fs/__init__.py
@@ -439,6 +439,10 @@ async def _check_and_move(src: Path, dst: Path, digest: str):
        except OSError:
            pass
        raise ValueError(f"Received data does not match digest: {digest}")
    dst.parent.mkdir(mode=0o755, parents=True, exist_ok=True)
    src.rename(dst)
    dst.chmod(0o644)
    if dst.exists():
        # never replace an existing blob, we would lose any existing hard links
        src.unlink()
    else:
        dst.parent.mkdir(mode=0o755, parents=True, exist_ok=True)
        src.rename(dst)
        dst.chmod(0o644)
-- 
2.39.2

[dlrepo/patches/.build.yml] build success

builds.sr.ht <builds@sr.ht>
Details
Message ID
<CXSDBPLTASCY.2UXLJDKAE42FI@cirno2>
In-Reply-To
<20231219140703.3140968-3-julien.floret@6wind.com> (view parent)
DKIM signature
missing
Download raw message
dlrepo/patches/.build.yml: SUCCESS in 1m42s

[fix orphaned hardlinks][0] from [Julien Floret][1]

[0]: https://lists.sr.ht/~rjarry/dlrepo/patches/47870
[1]: julien.floret@6wind.com

✓ #1117694 SUCCESS dlrepo/patches/.build.yml https://builds.sr.ht/~rjarry/job/1117694

Applied: [PATCH dlrepo 0/2] fix orphaned hardlinks

Details
Message ID
<170310284318.97062.13103749129017504618@ringo>
In-Reply-To
<20231219140703.3140968-1-julien.floret@6wind.com> (view parent)
DKIM signature
missing
Download raw message
Julien Floret <julien.floret@6wind.com> wrote:
> This small series is for fixing potential orphan hardlinks when the
> same file is uploaded several times into different places.
> Patch 1 is a preliminary code cleanup without functional changes.
> Patch 2 actually fixes the bug by adding a check before moving the
> uploaded file into the .blobs folder.
>
> Julien Floret (2):
>   fs: rm return value for finalize_upload
>   fs: ensure upload never replaces an existing blob

Acked-by: Robin Jarry <robin@jarry.cc>

Applied, thanks.

To git@git.sr.ht:~rjarry/dlrepo
   7d14913bec77..998ec62a1d22  main -> main
Reply to thread Export thread (mbox)