~radicle-link/discuss

gitd ref-rewriting RFC v1 PROPOSED

Alex Good: 1
 gitd ref-rewriting RFC
Fintan Halpenny: 1
 gitd ref-rewriting RFC

 2 files changed, 237 insertions(+), 4 deletions(-)
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~radicle-link/discuss/patches/33325/mbox | git am -3
Learn more about email & git

[PATCH] gitd ref-rewriting RFC Export this patch

Signed-off-by: Alex Good <alex@memoryandthought.me>
---
 docs/rfc/0704-gitd-ref-rewriting.adoc | 233 ++++++++++++++++++++++++++
 1 file changed, 233 insertions(+)
 create mode 100644 docs/rfc/0704-gitd-ref-rewriting.adoc

diff --git a/docs/rfc/0704-gitd-ref-rewriting.adoc b/docs/rfc/0704-gitd-ref-rewriting.adoc
new file mode 100644
index 00000000..458f7e6c
--- /dev/null
+++ b/docs/rfc/0704-gitd-ref-rewriting.adoc
@@ -0,0 +1,233 @@
= RFC: gitd ref rewriting
Alex Good <alex@memoryandthought.me>;
+
:revdate: 2022-06-27
:revremark: draft
:toc: preamble
:stem:

* Author: {author_1}
* Date: {revdate}
* Amended: {ammend_1}
* Status: {revremark}

== Motivation

`lnk-gitd` provides a local git interface to the monorepo which allows clients
to interact with a particular namespace using vanilla git. This is intended to
allow seamless interaction with radicle remotes without having to learn new
tools. Due to some details of the way that git handles remote refspecs we will
need to rewrite refs in the gitd to make this work.

== Terminology and Conventions

The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119>>
and <<RFC8174>> when, and only when, they appear in all capitals, as shown here.

== The Problem

Given a local `gitd` running at `127.0.0.1:9999` the following URL will provide
all refs under a given namespace

[source]
----
ssh://127.0.0.1:9999/rad:git:<encoded namespace>
----

We can then create remotes like this:

[source]
----
[remote "collaborator"]
	url = ssh://127.0.0.1:9999/rad:git:<urn>
	fetch = +refs/remotes/<peer id>/heads/*:refs/remotes/collaborator/*
----

`git fetch` will do the right thing here and fetch all the remote branches into
`refs/remotes/collaborator/*`. Unfortunately commands which reference a
particular branch or tag will not do the right thing. For example, `git fetch
collaborator mybranch` will attempt to fetch `refs/heads/<mybranch>`, which
doesn't exist. This is due to the following lines from the git fetch docs
<<git-fetch-docs>>.

[quote]
When `git fetch` is run with explicit branches and/or tags to fetch on the
command line, e.g. `git fetch origin master`, the ``<refspec>``s given on the
command line determine what are to be fetched (e.g. `master` in the example,
which is a short-hand for `master:`, which in turn means "fetch the master
branch but I do not explicitly say what remote-tracking branch to update with it
from the command line"), and the example command will fetch only the master
branch. The `remote.<repository>.fetch` values determine which remote-tracking
branch, if any, is updated. When used in this way, the
`remote.<repository>.fetch` values do not have any effect in deciding what gets
fetched (i.e. the values are not used as refspecs when the command-line lists
refspecs); they are only used to decide where the refs that are fetched are
stored by acting as a mapping.

This behaviour doesn't appear to be configurable, there is no way to tell git
that `git fetch <remote> <branch>` should fetch `refs/remotes/<peer
id>/refs/heads/<branch>` and likewise no way to say that `git fetch <remote> tag
<tag>` should fetch `refs/remotes/<peer id>/refs/tags/<tag>`. However, we do
control the `gitd` process, so we can make `gitd` rewrite refs to achieve the
same thing.

== Peer URLs and ref rewriting

The desired outcome is that `git fetch <remote> <branch>` and `git fetch
<remote> tag <tag>` should fetch the correct refs from the monorepo. To achieve
these we define a new URL for requests from the `gitd` which will be subject to
ref rewriting for fetch operations (`git-upload-pack`). Remotes can then point
at these URLs to fetch refs from particular peers.

When the `gitd` SSH server receives an exec request with a request of the form

[source]
----
git-upload-pack rad:git:<encoded urn>/<peer id>.git <1> <2>
----
<1> encoded URN is the base32-z encoding of the URN
<2> peer id is the base32-z encoding of the peer ID

`gitd` MUST parse the refs of the incoming request and rewrite them. In abstract
the rewriting rules are:

* The incoming rule :: When sending data to the `git` subprocess `gitd`, if the
  incoming (_from_ the `git` client) ref matches `refs/<remainder>` it MUST be
  rewritten to `refs/remotes/<peer_id>/<remainder>` before passing to the `git`
  subprocess
* The outgoing rule :: When receiving data from the `git` subprocess, if the
  outgoing (_to_ the `git` client) ref matches `refs/remotes/<peer
  id>/<remainder>` it MUST be rewritten to `refs/<remainder>`.

In all other cases messages MUST be left unchanged

In the following sections we specify specifically what parts of the git protocol
messages must be rewritten.

Note that these sections depend on the PKT-LINE format defined in the git
protocol documentation <<git-protocol-common>>.


=== Protocol v1

This section references grammers defined in <<<git-protocol-v1>>>. In protocol
v1 there are distinct phases of operation, the only phase which requires
rewriting is the "Reference Discovery" phase.

In this phase the server returns a list of references. These references appear
in the grammer under "Reference Discovery" in <<<git-protocol-v1>>> like so:

[source]
----
  advertised-refs  =  *1("version 1")
		      (no-refs / list-of-refs)
		      *shallow
		      flush-pkt

  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
		      NUL capability-list)

  list-of-refs     =  first-ref *other-ref
  first-ref        =  PKT-LINE(obj-id SP refname
		      NUL capability-list)

  other-ref        =  PKT-LINE(other-tip / other-peeled)
  other-tip        =  obj-id SP refname
  other-peeled     =  obj-id SP refname "^{}"

  shallow          =  PKT-LINE("shallow" SP obj-id)

  capability-list  =  capability *(SP capability)
  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
  LC_ALPHA         =  %x61-7A
----

In `gitd` this response is forwarded from a `git` subprocess to the SSH client.
`gitd` MUST transform all appearances of `refname` as according to the outgoing
rewrite rule.


=== Protocol v2

Protocol v2 is defined in <<<git-protocol-v2>>>. Protocol v2 is defined in terms
of commands. The two commands we are concerned with are `ls-refs`,  and `fetch`.
Each command is formatted like so:

[source]
----
request = empty-request | command-request
empty-request = flush-pkt
command-request = command
    capability-list
    delim-pkt
    command-args
    flush-pkt
command = PKT-LINE("command=" key LF)
command-args = *command-specific-arg
----

==== `ls-refs`

===== Request

`ls-refs` includes zero or more `ref-prefix` argument. Each argument is a
`PKT-LINE` framed message of the form `ref-prefix <prefix>`. When passing this
data through to the `git` subprocess `gitd` MUST rewrite the prefix as according
to the incoming rewrite rule.


===== Response

The response of `ls-refs` is as follows: 

[source]
----
output = *ref
  flush-pkt
obj-id-or-unborn = (obj-id | "unborn")
ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
ref-attribute = (symref | peeled)
symref = "symref-target:" symref-target
peeled = "peeled:" obj-id
----

`gitd` intercepts this output before sending it back to the SSH client and
transforms it as follows:

For each `ref` line 

* `refname` MUST be transformed according to the outgoing rewritine rules
* If the ref has a `ref-attribute` which is a `symref` then the `symref-target`
  MUST be transformed according to the outgoing rewrite rules

==== `fetch`

Fetch takes the following arguments which must be modified:

* `want-ref <ref>` :: Each `ref` MUST be rewritten according to the incoming
  rewrite rule

The fetch response has several sections, the only section we concern ourselves
with is the `wanted-refs` section which has the form:

[source]
----
wanted-refs = PKT-LINE("wanted-refs" LF)
*PKT-LINE(wanted-ref LF)
wanted-ref = obj-id SP refname
----

Here we rewrite `refname` using the outgoing rewrite rule.


[bibliography]
== References

* [[[git-fetch-docs]]] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches
* [[[RFC2119]]] https://www.rfc-editor.org/rfc/rfc2119>
* [[[RFC8174]]] https://www.rfc-editor.org/rfc/rfc8174>>
* [[[git-protocol-v1]]] https://git-scm.com/docs/pack-protocol
* [[[git-protocol-common]]] https://git-scm.com/docs/protocol-common
* [[[git-protocol-v2]]] https://www.git-scm.com/docs/protocol-v2
-- 
2.36.1
Published-At: https://github.com/alexjg/radicle-link/tree/patches/rfc/gitd-ref-rewriting/v1
Published-At:
    URN: rad:git:hnrkxafojjsz4m55qxbwigh1z8sdt7mai81gy
    peer: hydjhd8q9nkoxzkpddhcuue9xzpfr4bn6d44fo1f4q1japwm4brhh6
    seed: seed.lnk.network:8799
    tag: patches/rfc/gitd-ref-rewriting/v1

Re: [PATCH] gitd ref-rewriting RFC Export this patch

I might need to reread through it again, but at first glance it looks
good.

I'm wondering if it should be written with less of an attitude that
gitd already exists, and instead specify if someone was writing a new
gitd server and how they should handle ref passing for receive-pack
and upload-pack. Does that make sense?

Noticed some typos and white space:

---
diff --git a/docs/rfc/0704-gitd-ref-rewriting.adoc b/docs/rfc/0704-gitd-ref-rewriting.adoc
index 458f7e6c..253c10b6 100644
--- a/docs/rfc/0704-gitd-ref-rewriting.adoc
+++ b/docs/rfc/0704-gitd-ref-rewriting.adoc
@@ -112,12 +112,12 @@ protocol documentation <<git-protocol-common>>.

=== Protocol v1

This section references grammers defined in <<<git-protocol-v1>>>. In protocol
This section references grammars defined in <<<git-protocol-v1>>>. In protocol
v1 there are distinct phases of operation, the only phase which requires
rewriting is the "Reference Discovery" phase.

In this phase the server returns a list of references. These references appear
in the grammer under "Reference Discovery" in <<<git-protocol-v1>>> like so:
in the grammar under "Reference Discovery" in <<<git-protocol-v1>>> like so:

[source]
----
@@ -180,7 +180,7 @@ to the incoming rewrite rule.

===== Response

The response of `ls-refs` is as follows: 
The response of `ls-refs` is as follows:

[source]
----
@@ -196,7 +196,7 @@ peeled = "peeled:" obj-id
`gitd` intercepts this output before sending it back to the SSH client and
transforms it as follows:

For each `ref` line 
For each `ref` line

* `refname` MUST be transformed according to the outgoing rewritine rules
* If the ref has a `ref-attribute` which is a `symref` then the `symref-target`
---