~sircmpwn/sr.ht-discuss

19 7

Pruning unreachable commits from Git repositories

Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<8bd1f00b-9b83-4fa3-869c-78f9513ea186@app.fastmail.com>
DKIM signature
pass
Download raw message
Is there an existing mechanism, short of deleting and recreating the
entire repository, for removing all of the commits in a git.sr.ht
repository that are not reachable from any branch?  Equivalent to
running "git reflog expire --all --expire=all && git gc --prune=now"
for a local repository.

Thanks,
zw
Thorben Günther <thorben@xenrox.net>
Details
Message ID
<D3W13KBPE3C0.3DRUDBY5CRZ6N@xenrox.net>
In-Reply-To
<8bd1f00b-9b83-4fa3-869c-78f9513ea186@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
Hello,

there is a periodically running script [1], that executes "git gc".

[1]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/item/gitsrht-periodic#L55
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<258925ec-cbc1-412f-9f80-e7ecca1ac936@app.fastmail.com>
In-Reply-To
<D3W13KBPE3C0.3DRUDBY5CRZ6N@xenrox.net> (view parent)
DKIM signature
pass
Download raw message
That's nice and all but suppose I need it to happen immediately and as aggressively as possible.

On Mon, Sep 2, 2024, at 3:00 PM, Thorben Günther wrote:
> Hello,
>
> there is a periodically running script [1], that executes "git gc".
>
> [1]: https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/item/gitsrht-periodic#L55
>
> Attachments:
> * signature.asc
Thorben Günther <thorben@xenrox.net>
Details
Message ID
<D3W1P8QCWPG3.1SFQ5D18KPJ58@xenrox.net>
In-Reply-To
<258925ec-cbc1-412f-9f80-e7ecca1ac936@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon Sep 2, 2024 at 9:26 PM CEST, Zack Weinberg wrote:
> That's nice and all but suppose I need it to happen immediately and as aggressively as possible.

Then you should probably delete the repo, fix it locally and re-create
it.
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<b1e048eb-5257-4956-a201-da39bcab1bd8@app.fastmail.com>
In-Reply-To
<D3W1P8QCWPG3.1SFQ5D18KPJ58@xenrox.net> (view parent)
DKIM signature
pass
Download raw message
On Mon, Sep 2, 2024, at 3:29 PM, Thorben Günther wrote:
> On Mon Sep 2, 2024 at 9:26 PM CEST, Zack Weinberg wrote:
>> That's nice and all but suppose I need it to happen immediately and as aggressively as possible.
>
> Then you should probably delete the repo, fix it locally and re-create
> it.

Do you know for certain that at present there is no other way to do this?

zw
Details
Message ID
<D3W1WA6WB753.3ANSECI3Z261X@xenrox.net>
In-Reply-To
<b1e048eb-5257-4956-a201-da39bcab1bd8@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon Sep 2, 2024 at 9:35 PM CEST, Zack Weinberg wrote:
> Do you know for certain that at present there is no other way to do this?

Well you can try to reach the support and ask there:
~sircmpwn/sr.ht-support@lists.sr.ht
Details
Message ID
<1eb674d0-a1be-4417-a3ae-d11af05f5404@app.fastmail.com>
In-Reply-To
<b1e048eb-5257-4956-a201-da39bcab1bd8@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon, Sep 2, 2024, at 15:35, Zack Weinberg wrote:
> Do you know for certain that at present there is no other way to do this?
>

Just in case: if this is because you accidentally pushed a secret, you should consider it already leaked. Even if the repo were fully deleted, you can't know whether it's in a Sourcehut backup or the like. I'm only mentioning this because, as a former source hosting admin, I know what usually prompts these requests.

If that's *not* why you're asking, I'm curious how the blob would leak at this point unless you already know the SHA. I don't *think* contemporary Git ships packfiles of more than you asked for, but I may be misremembering.
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<7b0377b5-92c7-427a-a19c-f623dcff99e3@app.fastmail.com>
In-Reply-To
<1eb674d0-a1be-4417-a3ae-d11af05f5404@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon, Sep 2, 2024, at 4:24 PM, Benjamin Pollack wrote:
> If [a leaked secret is] *not* why you're asking, I'm curious how the blob would leak 
> at this point unless you already know the SHA.

I have just performed a filter-repo operation (not because of a leaked secret). I don't *think* anyone has cloned the repo besides me but I want to ensure that anyone who has cloned it gets an error the next time they pull, instead of silently getting put on a floating branch or some other such weirdness.

zw
Details
Message ID
<20240902204009.pxgtznn6m73fvdxj@HAL.starfruit-solutions.intranet>
In-Reply-To
<7b0377b5-92c7-427a-a19c-f623dcff99e3@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon, Sep 02, 2024 at 04:29:52PM -0400, Zack Weinberg wrote:
> I have just performed a filter-repo operation (not because of a leaked
> secret). I don't *think* anyone has cloned the repo besides me but I
> want to ensure that anyone who has cloned it gets an error the next
> time they pull, instead of silently getting put on a floating branch
> or some other such weirdness.

Even if others have cloned it, just let them know to fetch (not pull, as
that does a fetch and merge) the repository and reset it to your latest
commit.

That may be:

    git fetch --all # or not --all
    git reset --hard origin/main

No secrets, no worries.
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<0bc078b8-1c8e-40eb-8ad8-c831798a5dfb@app.fastmail.com>
In-Reply-To
<20240902204009.pxgtznn6m73fvdxj@HAL.starfruit-solutions.intranet> (view parent)
DKIM signature
pass
Download raw message
On Mon, Sep 2, 2024, at 4:40 PM, Victor Goff wrote:
> Even if others have cloned it, just let them know to fetch 

I have no way of knowing who may have cloned the repository.
Details
Message ID
<D3WHR3M9LS36.YOPRBW7EACRW@cmpwn.com>
In-Reply-To
<0bc078b8-1c8e-40eb-8ad8-c831798a5dfb@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
We periodically run git gc. If you need to remove unreachable commits
immediately the only way is to delete and re-create the repository. Not
sure why this is so important to you, though.
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<b9189aa4-ee26-4165-a5bc-91ca17072b1f@app.fastmail.com>
In-Reply-To
<D3WHR3M9LS36.YOPRBW7EACRW@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Sep 3, 2024, at 4:03 AM, Drew DeVault wrote:
> We periodically run git gc. If you need to remove unreachable commits
> immediately the only way is to delete and re-create the repository.

Understood.

> Not sure why this is so important to you, though.

I explained my reasons already.  It is important to me that, if anyone
happens to have cloned this particular repo (I don't know that anyone
_has_, but it _is_ public) that they get an error message the next time
they try to pull.  The git manual gives me the impression that as long
as the old tip of 'main' still exists on the server, a pull will
silently succeed.  If this is incorrect, I'd like to know about it.

If there were other people with commit privileges, I would also be
looking for a way to ensure that nobody could _push_ something descended
from one of the pruned commits.  Since there aren't, that's not
necessary, but it would still be nice to have.

It's not my present circumstance, but if you would like a reason why
a self-service way to run an immediate "git gc --expire=now" would
be useful _to you as a hosting service_, consider the situation
where someone accidentally pushes something very large to the
server.  Maybe they notice almost immediately and force-push
a new commit without the blob, whatever it is, but it's still
hogging disk space and possibly slowing down all operations on
that repo.

zw
Details
Message ID
<D3WW4JHROTB0.3CER9TVAK057E@cmpwn.com>
In-Reply-To
<b9189aa4-ee26-4165-a5bc-91ca17072b1f@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Tue Sep 3, 2024 at 9:17 PM CEST, Zack Weinberg wrote:
> It's not my present circumstance, but if you would like a reason why
> a self-service way to run an immediate "git gc --expire=now" would
> be useful _to you as a hosting service_, consider the situation
> where someone accidentally pushes something very large to the
> server.  Maybe they notice almost immediately and force-push
> a new commit without the blob, whatever it is, but it's still
> hogging disk space and possibly slowing down all operations on
> that repo.

All the performance cost is on our end, therefore our problem to deal
with; end-users should not really be affected. You can't fetch
unreachable commits (at least not by default, and we don't change that
default).
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<9312987a-d95d-4a21-938d-7967ebcdce79@app.fastmail.com>
In-Reply-To
<D3WW4JHROTB0.3CER9TVAK057E@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Sep 3, 2024, at 3:19 PM, Drew DeVault wrote:
> You can't fetch unreachable commits (at least not by default, and we
> don't change that default).

Do you happen to know where this is documented?  Because the "git pull"
manpage really makes it sound like you can and you _will_, silently, if
your local remote-tracking branch points at a commit that is unreachable
from the remote's refs.

zw
Details
Message ID
<D3WW8WVH5GNE.AL4SXR3PHGA1@cmpwn.com>
In-Reply-To
<9312987a-d95d-4a21-938d-7967ebcdce79@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadarchiveallowUnreachable

It's not super well documented.
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<33958956-f6d8-4441-936f-0d79a1dc508a@app.fastmail.com>
In-Reply-To
<9312987a-d95d-4a21-938d-7967ebcdce79@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Sep 3, 2024, at 3:22 PM, Zack Weinberg wrote:
> On Tue, Sep 3, 2024, at 3:19 PM, Drew DeVault wrote:
>> You can't fetch unreachable commits (at least not by default, and we
>> don't change that default).
>
> Do you happen to know where this is documented?  Because the "git
> pull" manpage really makes it sound like you can and you _will_,
> silently, if your local remote-tracking branch points at a commit that
> is unreachable from the remote's refs.

clarification: if your local remote-tracking branch points at a commit
that the server does have but is (no longer) reachable from the server's
refs, and you didn't use --prune, my reading of the manpage is that git
will tell you you're up to date.

zw
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<2897cc2b-eca1-45bd-9870-7c779bcc9d83@app.fastmail.com>
In-Reply-To
<D3WW8WVH5GNE.AL4SXR3PHGA1@cmpwn.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Sep 3, 2024, at 3:25 PM, Drew DeVault wrote:
> https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadarchiveallowUnreachable
>
> It's not super well documented.

Yeah, I would _never_ have found that.  Thanks.

(Re what I said about "git pull", I have now tried it and I was wrong.
It does in fact update your tracking branch by name and then tell you
you've diverged.  I think we can leave things there.)

zw
Details
Message ID
<jwvwmjsz7uj.fsf-monnier+INBOX@gnu.org>
In-Reply-To
<b9189aa4-ee26-4165-a5bc-91ca17072b1f@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
> they try to pull.  The git manual gives me the impression that as long
> as the old tip of 'main' still exists on the server, a pull will
> silently succeed.  If this is incorrect, I'd like to know about it.

I do not believe this is true.

The `fetch` part of a pull will get the new revision of `main` and the
subsequent `merge` will then try to merge what the local revision was with
this new` main`.
This is the case regardless if the old revision is still on the remote
server or not.

The only way I can see the above statement being true is if you just
remove the `main` branch (without replacing it with a new history),  in
which case the `fetch` part will leave the local copy of `main` alone
(unless the user explicitly used `--prune`) and then silently succeed.


        Stefan
Zack Weinberg <zack@owlfolio.org>
Details
Message ID
<3859dcb1-5e84-4e87-a80b-e3fe2dca9551@app.fastmail.com>
In-Reply-To
<jwvwmjsz7uj.fsf-monnier+INBOX@gnu.org> (view parent)
DKIM signature
pass
Download raw message
On Tue, Sep 3, 2024, at 6:37 PM, Stefan Monnier wrote:
> The only way I can see the above statement being true is if you just
> remove the `main` branch (without replacing it with a new history),  in
> which case the `fetch` part will leave the local copy of `main` alone
> (unless the user explicitly used `--prune`) and then silently succeed.

Oh, this explains it.  I had this experience in the past ... when a project
renamed its "master" branch to "main" and didn't communicate clearly enough
what everyone with an old checkout needed to do about it.  I must have
over-generalized.  Thanks.

zw
Details
Message ID
<jwvo753wfep.fsf-monnier+INBOX@gnu.org>
In-Reply-To
<3859dcb1-5e84-4e87-a80b-e3fe2dca9551@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
>> The only way I can see the above statement being true is if you just
>> remove the `main` branch (without replacing it with a new history),  in
>> which case the `fetch` part will leave the local copy of `main` alone
>> (unless the user explicitly used `--prune`) and then silently succeed.
> Oh, this explains it.  I had this experience in the past ... when a project
> renamed its "master" branch to "main" and didn't communicate clearly enough
> what everyone with an old checkout needed to do about it.  I must have
> over-generalized.  Thanks.

Yeah, that's the common case.  The usual "solution" to that is to
replace the old `master` by something else (e.g. I used a dummy branch
with a single commit with a single README file saying this branch has
moved to `main`), instead of merely removing it.


        Stefan
Reply to thread Export thread (mbox)