~radicle-link/dev

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
5 2

[DRAFT v1 0/1] rfc: gitd

Details
Message ID
<20220714210250.760555-1-alex@memoryandthought.me>
DKIM signature
missing
Download raw message
This still needs a lot of work but I'm putting it out here if anyone is
interested. I'm attempting to give a more comprehensive specification of
`gitd` including a more detailed and step by step description of ref
rewriting.

Alex Good (1):
  gitd RFC

 docs/rfc/0704-gitd.adoc | 444 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 444 insertions(+)
 create mode 100644 docs/rfc/0704-gitd.adoc

-- 
2.36.1

[PATCH v1 1/1] gitd RFC

Details
Message ID
<20220714210250.760555-2-alex@memoryandthought.me>
In-Reply-To
<20220714210250.760555-1-alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
Patch: +444 -0
Signed-off-by: Alex Good <alex@memoryandthought.me>
---
 docs/rfc/0704-gitd.adoc | 444 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 444 insertions(+)
 create mode 100644 docs/rfc/0704-gitd.adoc

diff --git a/docs/rfc/0704-gitd.adoc b/docs/rfc/0704-gitd.adoc
new file mode 100644
index 00000000..29ce9821
--- /dev/null
+++ b/docs/rfc/0704-gitd.adoc
@@ -0,0 +1,444 @@
= RFC: Gitd
Alex Good <alex@memoryandthought.me>;
+
:revdate: 2022-06-27
:revremark: draft
:toc: preamble
:stem:

* Author: {author_1}
* Date: {revdate}
* Amended: {ammend_1}
* Status: {revremark}

== Motivation

Users are used to working with remote git repositories using the git CLI suite.
By implementing a git server which proxies the monorepo we can enable users to
interact with Link identities using standard git tooling.

== Overview

The local view of the network is available in the monorepo as specified in 
xref:./0001-identity_resolution.adoc#namespacing[Namespacing].

To achieve transparent interaction with radicle remotes we expose a network
endpoint which the git protocol understands and which performs two functions:

1. Updating the local peers signed refs on push to a particular URN 
2. Exposing remote peers refs in a manner compatible with the ref layout git
   expects so that git commands such as `git fetch <remote> tag <tag>` work as
   expected.

We achieve this by implementing an SSH server which responds to git requests
such that remotes of the form

[source]
----
ssh://<host>/<urn>.git
----

Will work as expected. SSH URLs in git are fetched by connecting to the server
and making an `exec` request (<<ssh-protocol-exec-request>>) for either 

* `git-upload-pack <url>` in the case of fetching
* `git-receive-pack <url>` in the case of pushing

The `gitd` SSH server intercepts these and forwards them to a subprocess which
runs either `git-upload-pack` or `git-receive-pack` in the monorepo with an
additional `--namespace <urn>` so that only refs from the namespace in question
are exposed. `gitd` then intercepts protocol messages running over the proxied
standard input and output channels and rewrites refs so that the refs for each
individual peer under the URN are in the conventional layout git expects. See
<<appendix_bad_ref_layout>> for why the ref rewriting is necessary.

In order to update the signed refs on pushes the `gitd` obtains the signing key
from a running SSH agent.

== Terminology and Conventions

The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119>>
and <<RFC8174>> when, and only when, they appear in all capitals, as shown here.

== Gitd SSH Interface

The gitd server exposes an SSH server. Connections to  the SSH server MUST be
authenticated using <<ssh-protocol-publickey>>. Once connected and authenticated
the SSH server accepts `exec` requests <<ssh-protocol-exec-request>> with a
command of either:

* `git-upload-pack [<peer>@]rad:git:<urn>.git`
* `git-receive-pack <urn>.git`

Where `<peer>` is the base32-z encoded bytes of a peer ID and `<urn>` is a
base32-z encoded link URN. If the command does not match either of these
patterns `gitd` MUST respond with a `SSH_MESSAGE_CHANNEL_CLOSE` message. The
`gitd` server MAY first send a UTF-8 encoded string describing the error as an
extended data message with a `data_type_code` of `1`.

The `gitd` server then invokes one of the following commands and proxies stdout
and stderr through to the subprocess. The proxied stdin and stdout are subject
to <<ref-rewriting>>.

=== `git-upload-pack [peer@]rad:git:<urn>.git`

The invoked command MUST be

[source,bash]
----
git upload-pack \
    --namespace <urn> \
    -c transfer.hiderefs=refs/remotes \
    -c transfer.hiderefs=refs/remotes/rad \
    -c transfer.hiderefs=refs/remotes/cobs \
    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
----
<1> This line is repeated for each visible ref in each remote in the namespace.
A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
everything except the `rad` and `cobs` category of refs under the remote.

=== `git-receive-pack rad:git:<urn>.git`

The invoked command MUST be

[source,bash]
----
git upload-pack \
    --namespace <urn> \
    -c transfer.hiderefs=refs/remotes \
    -c transfer.hiderefs=refs/remotes/rad \
    -c transfer.hiderefs=refs/remotes/cobs \
    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
----

When the server receives an `exec` request for a `git-receive-pack` request it
MUST ensure that the authenticated public key for the request is the public key
corresponding to the signing key of the monorepo it proxies. If the public key
does not match the server MUST respond with an `SSH_MESSAGE_CHANNEL_CLOSE` and
MAY first send a UTF-8 encoded string describing the error as an extended data
message with a `data_type_code` of `1`.

Once the subprocess has completed `gitd` MUST attempt to update the signed refs
for the namespace in question. To do this `gitd` attempts to retrieve a key from
the SSH agent running at `$SSH_AUTH_SOCK`. If this is not possible then `gitd`
MUST report an error as an extended data messaage with a `data_type_code` of `1`.

=== Environment variables

If the client issues a channel request of type `"env"` before sending an `exec`
request then `gitd` MUST store the associated name and value and pass those
values into the environment of invoked subprocesses for that channel.


[#ref-rewriting]
== Peer URLs and ref rewriting

Once the `gitd` has started a git subprocess and is proxying data from the SSH
client to the subprocess then the remaining responsibility of `gitd` is to
intercept the git protocol messages running over the proxied streams and rewrite
some refs. Concretely, if the URL that was passed to the `exec` command was of
the form `<peer>@rad:git:<urn>.git` (it contains a peer ID) then `gitd` MUST
rewrite refs as follows, otherwise `gitd` MUST NOT rewrite refs.

=== Rewrite Rules

In abstract the rewriting `gitd` must perform is one of the following rules:

* The incoming rule :: When sending data to the `git` subprocess if the incoming
  (_from_ the `SSH` client) ref matches `refs/<remainder>` it MUST be rewritten
  to `refs/remotes/<peer_id>/<remainder>` before passing to the `git` subprocess
* The outgoing rule :: When receiving data from the `git` subprocess, if the
  outgoing (_to_ the `SSH` client) ref matches `refs/remotes/<peer
  id>/<remainder>` it MUST be rewritten to `refs/<remainder>`.

The following sections specify specifically what parts of the git protocol
messages must be rewritten for each command. 

=== Upload pack

After starting the `git-upload-pack` subprocess `gitd` intercepts the first
PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
pass the line through verbatim to the `SSH` client and proceed as according to
<<protocol-v2-rewriting>>.

If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
through verbatim to the `SSH` client and continue as per
<<v1-reference-discovery-rewriting>>.

If the first line is neither of the above then it is the first line of reference
discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.

Once the reference discovery step is complete all remaining input and output is
proxied without modification.

=== Receive Pack

After starting the `git-receive-pack` subprocess `gitd` intercepts the first
PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
pass the line through verbatim to the `SSH` client and proceed as according to
<<protocol-v2-rewriting>>.

If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
through verbatim to the `SSH` client and continue as per
<<v1-reference-discovery-rewriting>>.

If the first line is neither of the above then it is the first line of reference
discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.

Once reference discovery is complete the `SSH` client process will send
reference update requests as per <<git-protocol-reference-update-request>>.
`gitd` MUST execute the following pseudocode:

[source]
----
loop 
    let next_line = read_pkt_line_from_client()
    if next_line is flush packet
        send_to_subprocess(flush_packet)
        break
    else
        if next_line is command <1>
            rewritten = <rewrite refname in command according to incoming rule>
        else
            rewritten = next_line
        send_to_subprocess(rewritten)
----
<1> A command is a packet line which matches `<oid> SP <oid> SP name`

Once this loop is complete `gitd` MUST proxy all further input and output
without modification.

[#v1-reference-discovery-rewriting]
=== V1 Reference Discovery Rewriting

In both `git-upload-pack` and `git-receive-pack` the subprocess begins by
outputting all the references it knows about as per  the grammer under "Reference
Discovery" in <<<git-protocol-v1>>> which is repeated verbatim here:

[source]
----
  advertised-refs  =  *1("version 1")
		      (no-refs / list-of-refs)
		      *shallow
		      flush-pkt

  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
		      NUL capability-list)

  list-of-refs     =  first-ref *other-ref
  first-ref        =  PKT-LINE(obj-id SP refname
		      NUL capability-list)

  other-ref        =  PKT-LINE(other-tip / other-peeled)
  other-tip        =  obj-id SP refname
  other-peeled     =  obj-id SP refname "^{}"

  shallow          =  PKT-LINE("shallow" SP obj-id)

  capability-list  =  capability *(SP capability)
  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
  LC_ALPHA         =  %x61-7A
----

`gitd` starts by parsing the first line. The ref in the first line MUST be
rewritten as per the outgoing rewrite rule. If there is a `symref` capability in
the `capabilities` list (<<git-protocol-symref-capability>>) then `gitd` MUST
rewrite the ref in the `symref` as per the outgoing rewrite rule. This rewritten
packet line must then be sent to the `SSH` client.

Once this first line is complete `gitd` MUST execute the following algorithm

[source]
----
loop 
    let next_line = read_pkt_line_from_subprocess()
    if next_line is flush packet
        send_to_ssh_client(flush_packet)
        break
    else
        if next_line is other-ref
            rewritten = <rewrite refname in next_line according to outgoing rule>
        else
            rewritten = next_line
        send_to_ssh_client(rewritten)
----

Once this loop terminates the reference discovery step is complete.

[#protocol-v2-rewriting]
=== Protocol v2

Protocol v2 is defined in <<<git-protocol-v2>>>. Protocol v2 is defined in terms
of commands which are sent by the client (the `SSH` client here) to the server
(the subprocess). The grammar in <<git-protocol-v2>> is repeated verbatim here:

[source]
----
request = empty-request | command-request
empty-request = flush-pkt
command-request = command
    capability-list
    delim-pkt
    command-args
    flush-pkt
command = PKT-LINE("command=" key LF)
command-args = *command-specific-arg
----

While the client has an open connection to `gitd` then `gitd` MUST attempt to
read the next `command` `PKT-LINE` from the `SSH` client. For each command:

* If the `command` is `ls-refs` then proceed as according to
  <<protocol-v2-ls-refs>>
* If the `command` is `fetch` then proceed as accoding to <<protocol-v2-fetch>>
* Otherwise `gitd` MUST read the remainder of the command and pass the whole
  `command-request` through to the subprocess. `gitd` MUST then read from the
  subprocess until a flush packet is read passing everything through to the
  `SSH` client

[#protocol-v2-ls-refs]
==== `ls-refs`

`gitd` MUST parse the command arguments of the `ls-refs` command. For each
`ref-prefix` argument `gitd` MUST rewrite the ref according to the incoming
rewrite rule. Once this rewriting is complete the entire command MUST be passed
to the subprocess. 

The subprocess will now respond with the following:

[source]
----
output = *ref
  flush-pkt
obj-id-or-unborn = (obj-id | "unborn")
ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
ref-attribute = (symref | peeled)
symref = "symref-target:" symref-target
peeled = "peeled:" obj-id
----

`gitd` MUST read from the subprocess until a flush packet is received executing
the following pseudocode

[source]
----
loop
    let next_line = read_pkt_line_from_subprocess()
    if line is flush
        send_to_subprocess(line)
        break
    if line is ref
        rewritten = PKT_LINE(obj-id-or-unborn SP rewrite(refname) SP rewrite(attributes) LF) <1> <2>
    else
        rewritten = next_line
    send_to_subprocess(rewritten)
----
<1> `rewrite(refname)` means rewrite `refname` according to the outgoing rewrite
    rule
<2> `rewrite(attributes)` means for each attribute in the attributes, if the
    attribute is a `symref` then rewrite `symref-target` according to the outgoing
    rewrite rule

==== `fetch`

`gitd` MUST parse the command arguments of the fetch command. For each argument,
if the argument name is `want-ref` then the argument value MUST be rewritten
according to the incoming rewrite rule, otherwise the argument must be left as
is. Once this rewriting is complete the command MUST be passed to the
subprocess.

Once the command has been sent to the subprocess `gitd` MUST execute the
following pseudocode to rewrite the `wanted-refs` section of the response:

[source]
----
loop
    let next_line = read_pkt_line_from_client()
    if next_line is PKT-LINE("wanted-refs")
        loop
            let next_ref = read_pkt_line_from_client()
            if next_ref is delimiter_packet
                send_to_subprocess(delimiter_packet)
                break
            let rewritten = rewrite(next_ref) <1>
            send_to_subprocess(rewritten)
    else if next_line is flush_packet
        send_to_subprocess(next_line)
        break
    else
        send_to_subprocess(next_line)
----
<1> The `wanted-ref` argument has the form `obj-id SP refname`. Rewriting this
    means rewriting the refname according to the incoming rewrite rule.

Once this loop is complete the command handling is complete.

[appendix]
[[appendix_bad_ref_layout,Appendix A: Ref Layout Mismatch]]
== Ref Layout Mismatch

Why do we need to do ref rewriting? Imagine a `gitd` running at `127.0.0.1:9999`
which does everything specified here (specifically wrapping git commands and
calling them in a monorepo with a `--namespace` argument) but which _does not_ 
rewrite refs. Given such a `gitd` the following URL will provide all refs under
a given namespace

[source]
----
ssh://127.0.0.1:9999/rad:git:<encoded namespace>
----

We can then create remotes like this:

[source]
----
[remote "collaborator"]
	url = ssh://127.0.0.1:9999/rad:git:<urn>
	fetch = +refs/remotes/<peer id>/heads/*:refs/remotes/collaborator/*
----

`git fetch` will do the right thing here and fetch all the remote branches into
`refs/remotes/collaborator/*`. Unfortunately commands which reference a
particular branch or tag will not do the right thing. For example, `git fetch
collaborator mybranch` will attempt to fetch `refs/heads/<mybranch>`, which
doesn't exist. This is due to the following lines from the git fetch docs
<<git-fetch-docs>>.

[quote]
When `git fetch` is run with explicit branches and/or tags to fetch on the
command line, e.g. `git fetch origin master`, the ``<refspec>``s given on the
command line determine what are to be fetched (e.g. `master` in the example,
which is a short-hand for `master:`, which in turn means "fetch the master
branch but I do not explicitly say what remote-tracking branch to update with it
from the command line"), and the example command will fetch only the master
branch. The `remote.<repository>.fetch` values determine which remote-tracking
branch, if any, is updated. When used in this way, the
`remote.<repository>.fetch` values do not have any effect in deciding what gets
fetched (i.e. the values are not used as refspecs when the command-line lists
refspecs); they are only used to decide where the refs that are fetched are
stored by acting as a mapping.

This behaviour doesn't appear to be configurable, there is no way to tell git
that `git fetch <remote> <branch>` should fetch `refs/remotes/<peer
id>/refs/heads/<branch>` and likewise no way to say that `git fetch <remote> tag
<tag>` should fetch `refs/remotes/<peer id>/refs/tags/<tag>`. However, we do
control the `gitd` process, so we can make `gitd` rewrite refs to achieve the
same thing.


[bibliography]
== References

* [[[git-fetch-docs]]] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches
* [[[RFC2119]]] https://www.rfc-editor.org/rfc/rfc2119>
* [[[RFC8174]]] https://www.rfc-editor.org/rfc/rfc8174>>
* [[[git-protocol-v1]]] https://git-scm.com/docs/pack-protocol
* [[[git-protocol-common]]] https://git-scm.com/docs/protocol-common
* [[[git-protocol-v2]]] https://www.git-scm.com/docs/protocol-v2
* [[[git-protocol-capability-advertisment]]] https://git-scm.com/docs/protocol-v2#_capability_advertisement
* [[[git-protocol-symref-capability]]] https://git-scm.com/docs/protocol-capabilities#_symref
* [[[git-protocol-reference-update-request]]] https://git-scm.com/docs/pack-protocol#_reference_update_request_and_packfile_transfer
* [[[ssh-protocol-publickey]]] https://datatracker.ietf.org/doc/html/rfc4252#section-7
* [[[ssh-protocol-exec-request]]] https://datatracker.ietf.org/doc/html/rfc4254#section-6.5
-- 
2.36.1

Re: [PATCH v1 1/1] gitd RFC

Details
Message ID
<CLGCT0AWBH5V.3EX559LT6Q8YH@haptop>
In-Reply-To
<20220714210250.760555-2-alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
On Thu Jul 14, 2022 at 10:02 PM IST, Alex Good wrote:
> Signed-off-by: Alex Good <alex@memoryandthought.me>
> ---
>  docs/rfc/0704-gitd.adoc | 444 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 444 insertions(+)
>  create mode 100644 docs/rfc/0704-gitd.adoc
>
> diff --git a/docs/rfc/0704-gitd.adoc b/docs/rfc/0704-gitd.adoc
> new file mode 100644
> index 00000000..29ce9821
> --- /dev/null
> +++ b/docs/rfc/0704-gitd.adoc
> @@ -0,0 +1,444 @@
> += RFC: Gitd
> +Alex Good <alex@memoryandthought.me>;
> ++
> +:revdate: 2022-06-27
> +:revremark: draft
> +:toc: preamble
> +:stem:
> +
> +* Author: {author_1}

This isn't rendering right for me. If you use `{author}` it
works. Must be only if there are strictly more than 1 authors.

> +* Date: {revdate}
> +* Amended: {ammend_1}

I don't think this is an amendment :)

> +* Status: {revremark}
> +
> +== Motivation
> +
> +Users are used to working with remote git repositories using the git CLI suite.
> +By implementing a git server which proxies the monorepo we can enable users to
> +interact with Link identities using standard git tooling.
> +
> +== Overview
> +
> +The local view of the network is available in the monorepo as specified in 
> +xref:./0001-identity_resolution.adoc#namespacing[Namespacing].
> +
> +To achieve transparent interaction with radicle remotes we expose a network
> +endpoint which the git protocol understands and which performs two functions:

Radicle remotes can be quite ambiguous by itself. It can refer to
`refs/remotes` or remote peers. I think it would be good to be clear
on what you're referring to.

I also think network endpoint can be unclear as well. Iirc the essence
of a gitd isn't to talk to The Network:tm: -- that's optional. So it
exposes an SSH endpoint, right?

> +
> +1. Updating the local peers signed refs on push to a particular URN 
> +2. Exposing remote peers refs in a manner compatible with the ref layout git
> +   expects so that git commands such as `git fetch <remote> tag <tag>` work as
> +   expected.
> +
> +We achieve this by implementing an SSH server which responds to git requests
> +such that remotes of the form
> +
> +[source]
> +----
> +ssh://<host>/<urn>.git
> +----
> +
> +Will work as expected. SSH URLs in git are fetched by connecting to the server
> +and making an `exec` request (<<ssh-protocol-exec-request>>) for either

Fetched is a curious verb to use here :P

> +
> +* `git-upload-pack <url>` in the case of fetching
> +* `git-receive-pack <url>` in the case of pushing
> +
> +The `gitd` SSH server intercepts these and forwards them to a subprocess which
> +runs either `git-upload-pack` or `git-receive-pack` in the monorepo with an
> +additional `--namespace <urn>` so that only refs from the namespace in question
> +are exposed. `gitd` then intercepts protocol messages running over the proxied
> +standard input and output channels and rewrites refs so that the refs for each
> +individual peer under the URN are in the conventional layout git expects. See
> +<<appendix_bad_ref_layout>> for why the ref rewriting is necessary.

Is the `--namespace` parameter version dependent? I can't remember.

> +
> +In order to update the signed refs on pushes the `gitd` obtains the signing key
> +from a running SSH agent.
> +
> +== Terminology and Conventions
> +
> +The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
> +"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
> +"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119>>
> +and <<RFC8174>> when, and only when, they appear in all capitals, as shown here.
> +
> +== Gitd SSH Interface
> +
> +The gitd server exposes an SSH server. Connections to  the SSH server MUST be
> +authenticated using <<ssh-protocol-publickey>>. Once connected and authenticated
> +the SSH server accepts `exec` requests <<ssh-protocol-exec-request>> with a
> +command of either:
> +
> +* `git-upload-pack [<peer>@]rad:git:<urn>.git`
> +* `git-receive-pack <urn>.git`
> +
> +Where `<peer>` is the base32-z encoded bytes of a peer ID and `<urn>` is a
> +base32-z encoded link URN. If the command does not match either of these
> +patterns `gitd` MUST respond with a `SSH_MESSAGE_CHANNEL_CLOSE` message. The
> +`gitd` server MAY first send a UTF-8 encoded string describing the error as an
> +extended data message with a `data_type_code` of `1`.
> +
> +The `gitd` server then invokes one of the following commands and proxies stdout
> +and stderr through to the subprocess. The proxied stdin and stdout are subject
> +to <<ref-rewriting>>.
> +
> +=== `git-upload-pack [peer@]rad:git:<urn>.git`
> +
> +The invoked command MUST be
> +
> +[source,bash]
> +----
> +git upload-pack \
> +    --namespace <urn> \
> +    -c transfer.hiderefs=refs/remotes \
> +    -c transfer.hiderefs=refs/remotes/rad \
> +    -c transfer.hiderefs=refs/remotes/cobs \
> +    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
> +----
> +<1> This line is repeated for each visible ref in each remote in the namespace.
> +A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
> +everything except the `rad` and `cobs` category of refs under the remote.
> +
> +=== `git-receive-pack rad:git:<urn>.git`
> +
> +The invoked command MUST be
> +
> +[source,bash]
> +----
> +git upload-pack \
> +    --namespace <urn> \
> +    -c transfer.hiderefs=refs/remotes \
> +    -c transfer.hiderefs=refs/remotes/rad \
> +    -c transfer.hiderefs=refs/remotes/cobs \
> +    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
> +----
> +
> +When the server receives an `exec` request for a `git-receive-pack` request it
> +MUST ensure that the authenticated public key for the request is the public key
> +corresponding to the signing key of the monorepo it proxies. If the public key
> +does not match the server MUST respond with an `SSH_MESSAGE_CHANNEL_CLOSE` and
> +MAY first send a UTF-8 encoded string describing the error as an extended data
> +message with a `data_type_code` of `1`.
> +
> +Once the subprocess has completed `gitd` MUST attempt to update the signed refs
> +for the namespace in question. To do this `gitd` attempts to retrieve a key from
> +the SSH agent running at `$SSH_AUTH_SOCK`. If this is not possible then `gitd`
> +MUST report an error as an extended data messaage with a `data_type_code` of `1`.
> +
> +=== Environment variables
> +
> +If the client issues a channel request of type `"env"` before sending an `exec`
> +request then `gitd` MUST store the associated name and value and pass those
> +values into the environment of invoked subprocesses for that channel.
> +
> +
> +[#ref-rewriting]
> +== Peer URLs and ref rewriting
> +
> +Once the `gitd` has started a git subprocess and is proxying data from the SSH
> +client to the subprocess then the remaining responsibility of `gitd` is to
> +intercept the git protocol messages running over the proxied streams and rewrite
> +some refs. Concretely, if the URL that was passed to the `exec` command was of
> +the form `<peer>@rad:git:<urn>.git` (it contains a peer ID) then `gitd` MUST
> +rewrite refs as follows, otherwise `gitd` MUST NOT rewrite refs.
> +
> +=== Rewrite Rules
> +
> +In abstract the rewriting `gitd` must perform is one of the following rules:
> +
> +* The incoming rule :: When sending data to the `git` subprocess if the incoming
> +  (_from_ the `SSH` client) ref matches `refs/<remainder>` it MUST be rewritten
> +  to `refs/remotes/<peer_id>/<remainder>` before passing to the `git` subprocess
> +* The outgoing rule :: When receiving data from the `git` subprocess, if the
> +  outgoing (_to_ the `SSH` client) ref matches `refs/remotes/<peer
> +  id>/<remainder>` it MUST be rewritten to `refs/<remainder>`.
> +
> +The following sections specify specifically what parts of the git protocol
> +messages must be rewritten for each command. 
> +
> +=== Upload pack
> +
> +After starting the `git-upload-pack` subprocess `gitd` intercepts the first
> +PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
> +pass the line through verbatim to the `SSH` client and proceed as according to
> +<<protocol-v2-rewriting>>.
> +
> +If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
> +through verbatim to the `SSH` client and continue as per
> +<<v1-reference-discovery-rewriting>>.
> +
> +If the first line is neither of the above then it is the first line of reference
> +discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.
> +
> +Once the reference discovery step is complete all remaining input and output is
> +proxied without modification.
> +
> +=== Receive Pack
> +
> +After starting the `git-receive-pack` subprocess `gitd` intercepts the first
> +PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
> +pass the line through verbatim to the `SSH` client and proceed as according to
> +<<protocol-v2-rewriting>>.
> +
> +If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
> +through verbatim to the `SSH` client and continue as per
> +<<v1-reference-discovery-rewriting>>.
> +
> +If the first line is neither of the above then it is the first line of reference
> +discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.
> +
> +Once reference discovery is complete the `SSH` client process will send
> +reference update requests as per <<git-protocol-reference-update-request>>.
> +`gitd` MUST execute the following pseudocode:
> +
> +[source]
> +----
> +loop 
> +    let next_line = read_pkt_line_from_client()
> +    if next_line is flush packet
> +        send_to_subprocess(flush_packet)
> +        break
> +    else
> +        if next_line is command <1>
> +            rewritten = <rewrite refname in command according to incoming rule>
> +        else
> +            rewritten = next_line
> +        send_to_subprocess(rewritten)
> +----
> +<1> A command is a packet line which matches `<oid> SP <oid> SP name`
> +
> +Once this loop is complete `gitd` MUST proxy all further input and output
> +without modification.
> +
> +[#v1-reference-discovery-rewriting]
> +=== V1 Reference Discovery Rewriting
> +
> +In both `git-upload-pack` and `git-receive-pack` the subprocess begins by
> +outputting all the references it knows about as per  the grammer under "Reference
> +Discovery" in <<<git-protocol-v1>>> which is repeated verbatim here:
> +
> +[source]
> +----
> +  advertised-refs  =  *1("version 1")
> +		      (no-refs / list-of-refs)
> +		      *shallow
> +		      flush-pkt
> +
> +  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
> +		      NUL capability-list)
> +
> +  list-of-refs     =  first-ref *other-ref
> +  first-ref        =  PKT-LINE(obj-id SP refname
> +		      NUL capability-list)
> +
> +  other-ref        =  PKT-LINE(other-tip / other-peeled)
> +  other-tip        =  obj-id SP refname
> +  other-peeled     =  obj-id SP refname "^{}"
> +
> +  shallow          =  PKT-LINE("shallow" SP obj-id)
> +
> +  capability-list  =  capability *(SP capability)
> +  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
> +  LC_ALPHA         =  %x61-7A
> +----
> +
> +`gitd` starts by parsing the first line. The ref in the first line MUST be
> +rewritten as per the outgoing rewrite rule. If there is a `symref` capability in
> +the `capabilities` list (<<git-protocol-symref-capability>>) then `gitd` MUST
> +rewrite the ref in the `symref` as per the outgoing rewrite rule. This rewritten
> +packet line must then be sent to the `SSH` client.
> +
> +Once this first line is complete `gitd` MUST execute the following algorithm
> +
> +[source]
> +----
> +loop 
> +    let next_line = read_pkt_line_from_subprocess()
> +    if next_line is flush packet
> +        send_to_ssh_client(flush_packet)
> +        break
> +    else
> +        if next_line is other-ref
> +            rewritten = <rewrite refname in next_line according to outgoing rule>
> +        else
> +            rewritten = next_line
> +        send_to_ssh_client(rewritten)
> +----
> +
> +Once this loop terminates the reference discovery step is complete.
> +
> +[#protocol-v2-rewriting]
> +=== Protocol v2
> +
> +Protocol v2 is defined in <<<git-protocol-v2>>>. Protocol v2 is defined in terms
> +of commands which are sent by the client (the `SSH` client here) to the server
> +(the subprocess). The grammar in <<git-protocol-v2>> is repeated verbatim here:
> +
> +[source]
> +----
> +request = empty-request | command-request
> +empty-request = flush-pkt
> +command-request = command
> +    capability-list
> +    delim-pkt
> +    command-args
> +    flush-pkt
> +command = PKT-LINE("command=" key LF)
> +command-args = *command-specific-arg
> +----
> +
> +While the client has an open connection to `gitd` then `gitd` MUST attempt to
> +read the next `command` `PKT-LINE` from the `SSH` client. For each command:
> +
> +* If the `command` is `ls-refs` then proceed as according to
> +  <<protocol-v2-ls-refs>>
> +* If the `command` is `fetch` then proceed as accoding to <<protocol-v2-fetch>>
> +* Otherwise `gitd` MUST read the remainder of the command and pass the whole
> +  `command-request` through to the subprocess. `gitd` MUST then read from the
> +  subprocess until a flush packet is read passing everything through to the
> +  `SSH` client
> +
> +[#protocol-v2-ls-refs]
> +==== `ls-refs`
> +
> +`gitd` MUST parse the command arguments of the `ls-refs` command. For each
> +`ref-prefix` argument `gitd` MUST rewrite the ref according to the incoming
> +rewrite rule. Once this rewriting is complete the entire command MUST be passed
> +to the subprocess. 
> +
> +The subprocess will now respond with the following:
> +
> +[source]
> +----
> +output = *ref
> +  flush-pkt
> +obj-id-or-unborn = (obj-id | "unborn")
> +ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
> +ref-attribute = (symref | peeled)
> +symref = "symref-target:" symref-target
> +peeled = "peeled:" obj-id
> +----
> +
> +`gitd` MUST read from the subprocess until a flush packet is received executing
> +the following pseudocode
> +
> +[source]
> +----
> +loop
> +    let next_line = read_pkt_line_from_subprocess()
> +    if line is flush
> +        send_to_subprocess(line)
> +        break
> +    if line is ref
> +        rewritten = PKT_LINE(obj-id-or-unborn SP rewrite(refname) SP rewrite(attributes) LF) <1> <2>
> +    else
> +        rewritten = next_line
> +    send_to_subprocess(rewritten)
> +----
> +<1> `rewrite(refname)` means rewrite `refname` according to the outgoing rewrite
> +    rule
> +<2> `rewrite(attributes)` means for each attribute in the attributes, if the
> +    attribute is a `symref` then rewrite `symref-target` according to the outgoing
> +    rewrite rule
> +
> +==== `fetch`
> +
> +`gitd` MUST parse the command arguments of the fetch command. For each argument,
> +if the argument name is `want-ref` then the argument value MUST be rewritten
> +according to the incoming rewrite rule, otherwise the argument must be left as
> +is. Once this rewriting is complete the command MUST be passed to the
> +subprocess.
> +
> +Once the command has been sent to the subprocess `gitd` MUST execute the
> +following pseudocode to rewrite the `wanted-refs` section of the response:
> +
> +[source]
> +----
> +loop
> +    let next_line = read_pkt_line_from_client()
> +    if next_line is PKT-LINE("wanted-refs")
> +        loop
> +            let next_ref = read_pkt_line_from_client()
> +            if next_ref is delimiter_packet
> +                send_to_subprocess(delimiter_packet)
> +                break
> +            let rewritten = rewrite(next_ref) <1>
> +            send_to_subprocess(rewritten)
> +    else if next_line is flush_packet
> +        send_to_subprocess(next_line)
> +        break
> +    else
> +        send_to_subprocess(next_line)
> +----
> +<1> The `wanted-ref` argument has the form `obj-id SP refname`. Rewriting this
> +    means rewriting the refname according to the incoming rewrite rule.
> +
> +Once this loop is complete the command handling is complete.
> +
> +[appendix]
> +[[appendix_bad_ref_layout,Appendix A: Ref Layout Mismatch]]
> +== Ref Layout Mismatch
> +
> +Why do we need to do ref rewriting? Imagine a `gitd` running at `127.0.0.1:9999`
> +which does everything specified here (specifically wrapping git commands and
> +calling them in a monorepo with a `--namespace` argument) but which _does not_ 
> +rewrite refs. Given such a `gitd` the following URL will provide all refs under
> +a given namespace
> +
> +[source]
> +----
> +ssh://127.0.0.1:9999/rad:git:<encoded namespace>
> +----
> +
> +We can then create remotes like this:
> +
> +[source]
> +----
> +[remote "collaborator"]
> +	url = ssh://127.0.0.1:9999/rad:git:<urn>
> +	fetch = +refs/remotes/<peer id>/heads/*:refs/remotes/collaborator/*
> +----
> +
> +`git fetch` will do the right thing here and fetch all the remote branches into
> +`refs/remotes/collaborator/*`. Unfortunately commands which reference a
> +particular branch or tag will not do the right thing. For example, `git fetch
> +collaborator mybranch` will attempt to fetch `refs/heads/<mybranch>`, which
> +doesn't exist. This is due to the following lines from the git fetch docs
> +<<git-fetch-docs>>.
> +
> +[quote]
> +When `git fetch` is run with explicit branches and/or tags to fetch on the
> +command line, e.g. `git fetch origin master`, the ``<refspec>``s given on the
> +command line determine what are to be fetched (e.g. `master` in the example,
> +which is a short-hand for `master:`, which in turn means "fetch the master
> +branch but I do not explicitly say what remote-tracking branch to update with it
> +from the command line"), and the example command will fetch only the master
> +branch. The `remote.<repository>.fetch` values determine which remote-tracking
> +branch, if any, is updated. When used in this way, the
> +`remote.<repository>.fetch` values do not have any effect in deciding what gets
> +fetched (i.e. the values are not used as refspecs when the command-line lists
> +refspecs); they are only used to decide where the refs that are fetched are
> +stored by acting as a mapping.
> +
> +This behaviour doesn't appear to be configurable, there is no way to tell git
> +that `git fetch <remote> <branch>` should fetch `refs/remotes/<peer
> +id>/refs/heads/<branch>` and likewise no way to say that `git fetch <remote> tag
> +<tag>` should fetch `refs/remotes/<peer id>/refs/tags/<tag>`. However, we do
> +control the `gitd` process, so we can make `gitd` rewrite refs to achieve the
> +same thing.
> +
> +
> +[bibliography]
> +== References
> +
> +* [[[git-fetch-docs]]] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches
> +* [[[RFC2119]]] https://www.rfc-editor.org/rfc/rfc2119>
> +* [[[RFC8174]]] https://www.rfc-editor.org/rfc/rfc8174>>
> +* [[[git-protocol-v1]]] https://git-scm.com/docs/pack-protocol
> +* [[[git-protocol-common]]] https://git-scm.com/docs/protocol-common
> +* [[[git-protocol-v2]]] https://www.git-scm.com/docs/protocol-v2
> +* [[[git-protocol-capability-advertisment]]] https://git-scm.com/docs/protocol-v2#_capability_advertisement
> +* [[[git-protocol-symref-capability]]] https://git-scm.com/docs/protocol-capabilities#_symref
> +* [[[git-protocol-reference-update-request]]] https://git-scm.com/docs/pack-protocol#_reference_update_request_and_packfile_transfer
> +* [[[ssh-protocol-publickey]]] https://datatracker.ietf.org/doc/html/rfc4252#section-7
> +* [[[ssh-protocol-exec-request]]] https://datatracker.ietf.org/doc/html/rfc4254#section-6.5
> -- 
> 2.36.1

[PATCH v2 0/1] gitd RFC

Details
Message ID
<20220729094719.46453-1-alex@memoryandthought.me>
In-Reply-To
<20220714210250.760555-1-alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
v1 was confusingly named `draftv1` _and_ `v1`. I've just named this reroll `v2`
I've addressed the feedback points Fintan raised with the biggest change being
the specification of how to call `git-upload-pack` for older versions of the git
CLI suite.

Published-At: https://github.com/alexjg/radicle-link/tree/patches/rfc/gitd/v2

Alex Good (1):
  gitd RFC

 docs/rfc/0704-gitd.adoc | 470 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 470 insertions(+)
 create mode 100644 docs/rfc/0704-gitd.adoc

Range-diff against v1:
1:  43e54d1e ! 1:  18a91b61 gitd RFC
    @@ docs/rfc/0704-gitd.adoc (new)
     +:toc: preamble
     +:stem:
     +
    -+* Author: {author_1}
    ++* Author: {author}
     +* Date: {revdate}
    -+* Amended: {ammend_1}
     +* Status: {revremark}
     +
     +== Motivation
    @@ docs/rfc/0704-gitd.adoc (new)
     +The local view of the network is available in the monorepo as specified in 
     +xref:./0001-identity_resolution.adoc#namespacing[Namespacing].
     +
    -+To achieve transparent interaction with radicle remotes we expose a network
    -+endpoint which the git protocol understands and which performs two functions:
    ++To achieve transparent interaction with git remotes which point at radicle
    ++projects we expose an SSH server which the git protocol understands and which
    ++performs two functions:
     +
     +1. Updating the local peers signed refs on push to a particular URN 
     +2. Exposing remote peers refs in a manner compatible with the ref layout git
    @@ docs/rfc/0704-gitd.adoc (new)
     +ssh://<host>/<urn>.git
     +----
     +
    -+Will work as expected. SSH URLs in git are fetched by connecting to the server
    ++Will work as expected. SSH URLs in git are handled by connecting to the server
     +and making an `exec` request (<<ssh-protocol-exec-request>>) for either 
     +
     +* `git-upload-pack <url>` in the case of fetching
    @@ docs/rfc/0704-gitd.adoc (new)
     +
     +== Gitd SSH Interface
     +
    -+The gitd server exposes an SSH server. Connections to  the SSH server MUST be
    ++The gitd server exposes an SSH server. Connections to the SSH server MUST be
     +authenticated using <<ssh-protocol-publickey>>. Once connected and authenticated
     +the SSH server accepts `exec` requests <<ssh-protocol-exec-request>> with a
     +command of either:
    @@ docs/rfc/0704-gitd.adoc (new)
     +
     +=== `git-upload-pack [peer@]rad:git:<urn>.git`
     +
    ++There are two versions of this command due to older versions of
    ++`git-upload-pack` not handling namespaces correctly, see
    ++<<git-upload-pack-bad-namespace>>.
    ++
    ++==== `git --version >= 2.34.0`
    ++
     +The invoked command MUST be
     +
     +[source,bash]
    @@ docs/rfc/0704-gitd.adoc (new)
     +A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
     +everything except the `rad` and `cobs` category of refs under the remote.
     +
    ++==== `git --version < 2.34.0`
    ++
    ++The invoked command MUST be
    ++
    ++[source,bash]
    ++----
    ++git upload-pack \
    ++    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes \ <1>
    ++    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/rad \
    ++    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/cobs \
    ++    -c uploadpack.hiderefs=!^$UNHIDDEN <2>
    ++----
    ++<1> Note that in contrast to the invocation for `git >= 2.34.0` we must include
    ++the `refs/namespaces/<urn>` prefix. Here the `<urn>` component is the base32-z
    ++encoded URN.
    ++<2> This line is repeated for each visible ref in each remote in the namespace.
    ++A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
    ++everything except the `rad` and `cobs` category of refs under the remote.
    ++
     +=== `git-receive-pack rad:git:<urn>.git`
     +
     +The invoked command MUST be
    @@ docs/rfc/0704-gitd.adoc (new)
     +* [[[git-protocol-reference-update-request]]] https://git-scm.com/docs/pack-protocol#_reference_update_request_and_packfile_transfer
     +* [[[ssh-protocol-publickey]]] https://datatracker.ietf.org/doc/html/rfc4252#section-7
     +* [[[ssh-protocol-exec-request]]] https://datatracker.ietf.org/doc/html/rfc4254#section-6.5
    ++* [[[git-upload-pack-bad-namespace]]] https://lore.kernel.org/git/CD2XNXHACAXS.13J6JTWZPO1JA@schmidt/
-- 
2.37.0

[PATCH v2 1/1] gitd RFC

Details
Message ID
<20220729094719.46453-2-alex@memoryandthought.me>
In-Reply-To
<20220729094719.46453-1-alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
Patch: +470 -0
Signed-off-by: Alex Good <alex@memoryandthought.me>
---
 docs/rfc/0704-gitd.adoc | 470 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 470 insertions(+)
 create mode 100644 docs/rfc/0704-gitd.adoc

diff --git a/docs/rfc/0704-gitd.adoc b/docs/rfc/0704-gitd.adoc
new file mode 100644
index 00000000..eac519d7
--- /dev/null
+++ b/docs/rfc/0704-gitd.adoc
@@ -0,0 +1,470 @@
= RFC: Gitd
Alex Good <alex@memoryandthought.me>;
+
:revdate: 2022-06-27
:revremark: draft
:toc: preamble
:stem:

* Author: {author}
* Date: {revdate}
* Status: {revremark}

== Motivation

Users are used to working with remote git repositories using the git CLI suite.
By implementing a git server which proxies the monorepo we can enable users to
interact with Link identities using standard git tooling.

== Overview

The local view of the network is available in the monorepo as specified in 
xref:./0001-identity_resolution.adoc#namespacing[Namespacing].

To achieve transparent interaction with git remotes which point at radicle
projects we expose an SSH server which the git protocol understands and which
performs two functions:

1. Updating the local peers signed refs on push to a particular URN 
2. Exposing remote peers refs in a manner compatible with the ref layout git
   expects so that git commands such as `git fetch <remote> tag <tag>` work as
   expected.

We achieve this by implementing an SSH server which responds to git requests
such that remotes of the form

[source]
----
ssh://<host>/<urn>.git
----

Will work as expected. SSH URLs in git are handled by connecting to the server
and making an `exec` request (<<ssh-protocol-exec-request>>) for either 

* `git-upload-pack <url>` in the case of fetching
* `git-receive-pack <url>` in the case of pushing

The `gitd` SSH server intercepts these and forwards them to a subprocess which
runs either `git-upload-pack` or `git-receive-pack` in the monorepo with an
additional `--namespace <urn>` so that only refs from the namespace in question
are exposed. `gitd` then intercepts protocol messages running over the proxied
standard input and output channels and rewrites refs so that the refs for each
individual peer under the URN are in the conventional layout git expects. See
<<appendix_bad_ref_layout>> for why the ref rewriting is necessary.

In order to update the signed refs on pushes the `gitd` obtains the signing key
from a running SSH agent.

== Terminology and Conventions

The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119>>
and <<RFC8174>> when, and only when, they appear in all capitals, as shown here.

== Gitd SSH Interface

The gitd server exposes an SSH server. Connections to the SSH server MUST be
authenticated using <<ssh-protocol-publickey>>. Once connected and authenticated
the SSH server accepts `exec` requests <<ssh-protocol-exec-request>> with a
command of either:

* `git-upload-pack [<peer>@]rad:git:<urn>.git`
* `git-receive-pack <urn>.git`

Where `<peer>` is the base32-z encoded bytes of a peer ID and `<urn>` is a
base32-z encoded link URN. If the command does not match either of these
patterns `gitd` MUST respond with a `SSH_MESSAGE_CHANNEL_CLOSE` message. The
`gitd` server MAY first send a UTF-8 encoded string describing the error as an
extended data message with a `data_type_code` of `1`.

The `gitd` server then invokes one of the following commands and proxies stdout
and stderr through to the subprocess. The proxied stdin and stdout are subject
to <<ref-rewriting>>.

=== `git-upload-pack [peer@]rad:git:<urn>.git`

There are two versions of this command due to older versions of
`git-upload-pack` not handling namespaces correctly, see
<<git-upload-pack-bad-namespace>>.

==== `git --version >= 2.34.0`

The invoked command MUST be

[source,bash]
----
git upload-pack \
    --namespace <urn> \
    -c transfer.hiderefs=refs/remotes \
    -c transfer.hiderefs=refs/remotes/rad \
    -c transfer.hiderefs=refs/remotes/cobs \
    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
----
<1> This line is repeated for each visible ref in each remote in the namespace.
A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
everything except the `rad` and `cobs` category of refs under the remote.

==== `git --version < 2.34.0`

The invoked command MUST be

[source,bash]
----
git upload-pack \
    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes \ <1>
    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/rad \
    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/cobs \
    -c uploadpack.hiderefs=!^$UNHIDDEN <2>
----
<1> Note that in contrast to the invocation for `git >= 2.34.0` we must include
the `refs/namespaces/<urn>` prefix. Here the `<urn>` component is the base32-z
encoded URN.
<2> This line is repeated for each visible ref in each remote in the namespace.
A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
everything except the `rad` and `cobs` category of refs under the remote.

=== `git-receive-pack rad:git:<urn>.git`

The invoked command MUST be

[source,bash]
----
git upload-pack \
    --namespace <urn> \
    -c transfer.hiderefs=refs/remotes \
    -c transfer.hiderefs=refs/remotes/rad \
    -c transfer.hiderefs=refs/remotes/cobs \
    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
----

When the server receives an `exec` request for a `git-receive-pack` request it
MUST ensure that the authenticated public key for the request is the public key
corresponding to the signing key of the monorepo it proxies. If the public key
does not match the server MUST respond with an `SSH_MESSAGE_CHANNEL_CLOSE` and
MAY first send a UTF-8 encoded string describing the error as an extended data
message with a `data_type_code` of `1`.

Once the subprocess has completed `gitd` MUST attempt to update the signed refs
for the namespace in question. To do this `gitd` attempts to retrieve a key from
the SSH agent running at `$SSH_AUTH_SOCK`. If this is not possible then `gitd`
MUST report an error as an extended data messaage with a `data_type_code` of `1`.

=== Environment variables

If the client issues a channel request of type `"env"` before sending an `exec`
request then `gitd` MUST store the associated name and value and pass those
values into the environment of invoked subprocesses for that channel.


[#ref-rewriting]
== Peer URLs and ref rewriting

Once the `gitd` has started a git subprocess and is proxying data from the SSH
client to the subprocess then the remaining responsibility of `gitd` is to
intercept the git protocol messages running over the proxied streams and rewrite
some refs. Concretely, if the URL that was passed to the `exec` command was of
the form `<peer>@rad:git:<urn>.git` (it contains a peer ID) then `gitd` MUST
rewrite refs as follows, otherwise `gitd` MUST NOT rewrite refs.

=== Rewrite Rules

In abstract the rewriting `gitd` must perform is one of the following rules:

* The incoming rule :: When sending data to the `git` subprocess if the incoming
  (_from_ the `SSH` client) ref matches `refs/<remainder>` it MUST be rewritten
  to `refs/remotes/<peer_id>/<remainder>` before passing to the `git` subprocess
* The outgoing rule :: When receiving data from the `git` subprocess, if the
  outgoing (_to_ the `SSH` client) ref matches `refs/remotes/<peer
  id>/<remainder>` it MUST be rewritten to `refs/<remainder>`.

The following sections specify specifically what parts of the git protocol
messages must be rewritten for each command. 

=== Upload pack

After starting the `git-upload-pack` subprocess `gitd` intercepts the first
PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
pass the line through verbatim to the `SSH` client and proceed as according to
<<protocol-v2-rewriting>>.

If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
through verbatim to the `SSH` client and continue as per
<<v1-reference-discovery-rewriting>>.

If the first line is neither of the above then it is the first line of reference
discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.

Once the reference discovery step is complete all remaining input and output is
proxied without modification.

=== Receive Pack

After starting the `git-receive-pack` subprocess `gitd` intercepts the first
PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
pass the line through verbatim to the `SSH` client and proceed as according to
<<protocol-v2-rewriting>>.

If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
through verbatim to the `SSH` client and continue as per
<<v1-reference-discovery-rewriting>>.

If the first line is neither of the above then it is the first line of reference
discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.

Once reference discovery is complete the `SSH` client process will send
reference update requests as per <<git-protocol-reference-update-request>>.
`gitd` MUST execute the following pseudocode:

[source]
----
loop 
    let next_line = read_pkt_line_from_client()
    if next_line is flush packet
        send_to_subprocess(flush_packet)
        break
    else
        if next_line is command <1>
            rewritten = <rewrite refname in command according to incoming rule>
        else
            rewritten = next_line
        send_to_subprocess(rewritten)
----
<1> A command is a packet line which matches `<oid> SP <oid> SP name`

Once this loop is complete `gitd` MUST proxy all further input and output
without modification.

[#v1-reference-discovery-rewriting]
=== V1 Reference Discovery Rewriting

In both `git-upload-pack` and `git-receive-pack` the subprocess begins by
outputting all the references it knows about as per  the grammer under "Reference
Discovery" in <<<git-protocol-v1>>> which is repeated verbatim here:

[source]
----
  advertised-refs  =  *1("version 1")
		      (no-refs / list-of-refs)
		      *shallow
		      flush-pkt

  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
		      NUL capability-list)

  list-of-refs     =  first-ref *other-ref
  first-ref        =  PKT-LINE(obj-id SP refname
		      NUL capability-list)

  other-ref        =  PKT-LINE(other-tip / other-peeled)
  other-tip        =  obj-id SP refname
  other-peeled     =  obj-id SP refname "^{}"

  shallow          =  PKT-LINE("shallow" SP obj-id)

  capability-list  =  capability *(SP capability)
  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
  LC_ALPHA         =  %x61-7A
----

`gitd` starts by parsing the first line. The ref in the first line MUST be
rewritten as per the outgoing rewrite rule. If there is a `symref` capability in
the `capabilities` list (<<git-protocol-symref-capability>>) then `gitd` MUST
rewrite the ref in the `symref` as per the outgoing rewrite rule. This rewritten
packet line must then be sent to the `SSH` client.

Once this first line is complete `gitd` MUST execute the following algorithm

[source]
----
loop 
    let next_line = read_pkt_line_from_subprocess()
    if next_line is flush packet
        send_to_ssh_client(flush_packet)
        break
    else
        if next_line is other-ref
            rewritten = <rewrite refname in next_line according to outgoing rule>
        else
            rewritten = next_line
        send_to_ssh_client(rewritten)
----

Once this loop terminates the reference discovery step is complete.

[#protocol-v2-rewriting]
=== Protocol v2

Protocol v2 is defined in <<<git-protocol-v2>>>. Protocol v2 is defined in terms
of commands which are sent by the client (the `SSH` client here) to the server
(the subprocess). The grammar in <<git-protocol-v2>> is repeated verbatim here:

[source]
----
request = empty-request | command-request
empty-request = flush-pkt
command-request = command
    capability-list
    delim-pkt
    command-args
    flush-pkt
command = PKT-LINE("command=" key LF)
command-args = *command-specific-arg
----

While the client has an open connection to `gitd` then `gitd` MUST attempt to
read the next `command` `PKT-LINE` from the `SSH` client. For each command:

* If the `command` is `ls-refs` then proceed as according to
  <<protocol-v2-ls-refs>>
* If the `command` is `fetch` then proceed as accoding to <<protocol-v2-fetch>>
* Otherwise `gitd` MUST read the remainder of the command and pass the whole
  `command-request` through to the subprocess. `gitd` MUST then read from the
  subprocess until a flush packet is read passing everything through to the
  `SSH` client

[#protocol-v2-ls-refs]
==== `ls-refs`

`gitd` MUST parse the command arguments of the `ls-refs` command. For each
`ref-prefix` argument `gitd` MUST rewrite the ref according to the incoming
rewrite rule. Once this rewriting is complete the entire command MUST be passed
to the subprocess. 

The subprocess will now respond with the following:

[source]
----
output = *ref
  flush-pkt
obj-id-or-unborn = (obj-id | "unborn")
ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
ref-attribute = (symref | peeled)
symref = "symref-target:" symref-target
peeled = "peeled:" obj-id
----

`gitd` MUST read from the subprocess until a flush packet is received executing
the following pseudocode

[source]
----
loop
    let next_line = read_pkt_line_from_subprocess()
    if line is flush
        send_to_subprocess(line)
        break
    if line is ref
        rewritten = PKT_LINE(obj-id-or-unborn SP rewrite(refname) SP rewrite(attributes) LF) <1> <2>
    else
        rewritten = next_line
    send_to_subprocess(rewritten)
----
<1> `rewrite(refname)` means rewrite `refname` according to the outgoing rewrite
    rule
<2> `rewrite(attributes)` means for each attribute in the attributes, if the
    attribute is a `symref` then rewrite `symref-target` according to the outgoing
    rewrite rule

==== `fetch`

`gitd` MUST parse the command arguments of the fetch command. For each argument,
if the argument name is `want-ref` then the argument value MUST be rewritten
according to the incoming rewrite rule, otherwise the argument must be left as
is. Once this rewriting is complete the command MUST be passed to the
subprocess.

Once the command has been sent to the subprocess `gitd` MUST execute the
following pseudocode to rewrite the `wanted-refs` section of the response:

[source]
----
loop
    let next_line = read_pkt_line_from_client()
    if next_line is PKT-LINE("wanted-refs")
        loop
            let next_ref = read_pkt_line_from_client()
            if next_ref is delimiter_packet
                send_to_subprocess(delimiter_packet)
                break
            let rewritten = rewrite(next_ref) <1>
            send_to_subprocess(rewritten)
    else if next_line is flush_packet
        send_to_subprocess(next_line)
        break
    else
        send_to_subprocess(next_line)
----
<1> The `wanted-ref` argument has the form `obj-id SP refname`. Rewriting this
    means rewriting the refname according to the incoming rewrite rule.

Once this loop is complete the command handling is complete.

[appendix]
[[appendix_bad_ref_layout,Appendix A: Ref Layout Mismatch]]
== Ref Layout Mismatch

Why do we need to do ref rewriting? Imagine a `gitd` running at `127.0.0.1:9999`
which does everything specified here (specifically wrapping git commands and
calling them in a monorepo with a `--namespace` argument) but which _does not_ 
rewrite refs. Given such a `gitd` the following URL will provide all refs under
a given namespace

[source]
----
ssh://127.0.0.1:9999/rad:git:<encoded namespace>
----

We can then create remotes like this:

[source]
----
[remote "collaborator"]
	url = ssh://127.0.0.1:9999/rad:git:<urn>
	fetch = +refs/remotes/<peer id>/heads/*:refs/remotes/collaborator/*
----

`git fetch` will do the right thing here and fetch all the remote branches into
`refs/remotes/collaborator/*`. Unfortunately commands which reference a
particular branch or tag will not do the right thing. For example, `git fetch
collaborator mybranch` will attempt to fetch `refs/heads/<mybranch>`, which
doesn't exist. This is due to the following lines from the git fetch docs
<<git-fetch-docs>>.

[quote]
When `git fetch` is run with explicit branches and/or tags to fetch on the
command line, e.g. `git fetch origin master`, the ``<refspec>``s given on the
command line determine what are to be fetched (e.g. `master` in the example,
which is a short-hand for `master:`, which in turn means "fetch the master
branch but I do not explicitly say what remote-tracking branch to update with it
from the command line"), and the example command will fetch only the master
branch. The `remote.<repository>.fetch` values determine which remote-tracking
branch, if any, is updated. When used in this way, the
`remote.<repository>.fetch` values do not have any effect in deciding what gets
fetched (i.e. the values are not used as refspecs when the command-line lists
refspecs); they are only used to decide where the refs that are fetched are
stored by acting as a mapping.

This behaviour doesn't appear to be configurable, there is no way to tell git
that `git fetch <remote> <branch>` should fetch `refs/remotes/<peer
id>/refs/heads/<branch>` and likewise no way to say that `git fetch <remote> tag
<tag>` should fetch `refs/remotes/<peer id>/refs/tags/<tag>`. However, we do
control the `gitd` process, so we can make `gitd` rewrite refs to achieve the
same thing.


[bibliography]
== References

* [[[git-fetch-docs]]] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches
* [[[RFC2119]]] https://www.rfc-editor.org/rfc/rfc2119>
* [[[RFC8174]]] https://www.rfc-editor.org/rfc/rfc8174>>
* [[[git-protocol-v1]]] https://git-scm.com/docs/pack-protocol
* [[[git-protocol-common]]] https://git-scm.com/docs/protocol-common
* [[[git-protocol-v2]]] https://www.git-scm.com/docs/protocol-v2
* [[[git-protocol-capability-advertisment]]] https://git-scm.com/docs/protocol-v2#_capability_advertisement
* [[[git-protocol-symref-capability]]] https://git-scm.com/docs/protocol-capabilities#_symref
* [[[git-protocol-reference-update-request]]] https://git-scm.com/docs/pack-protocol#_reference_update_request_and_packfile_transfer
* [[[ssh-protocol-publickey]]] https://datatracker.ietf.org/doc/html/rfc4252#section-7
* [[[ssh-protocol-exec-request]]] https://datatracker.ietf.org/doc/html/rfc4254#section-6.5
* [[[git-upload-pack-bad-namespace]]] https://lore.kernel.org/git/CD2XNXHACAXS.13J6JTWZPO1JA@schmidt/
-- 
2.37.0

Re: [PATCH v2 1/1] gitd RFC

Details
Message ID
<CLS3UK8NZY06.2IT3GSYSVH07F@haptop>
In-Reply-To
<20220729094719.46453-2-alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
Could you remove the trailing whitespace that's showing up? :)

Something I think we're missing here is the use of server
options. Should we mention that they MAY be used in the future to
allow for custom gitd behaviour?

On Fri Jul 29, 2022 at 10:47 AM IST, Alex Good wrote:
> Signed-off-by: Alex Good <alex@memoryandthought.me>
> ---
>  docs/rfc/0704-gitd.adoc | 470 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 470 insertions(+)
>  create mode 100644 docs/rfc/0704-gitd.adoc
>
> diff --git a/docs/rfc/0704-gitd.adoc b/docs/rfc/0704-gitd.adoc
> new file mode 100644
> index 00000000..eac519d7
> --- /dev/null
> +++ b/docs/rfc/0704-gitd.adoc
> @@ -0,0 +1,470 @@
> += RFC: Gitd
> +Alex Good <alex@memoryandthought.me>;
> ++
> +:revdate: 2022-06-27
> +:revremark: draft
> +:toc: preamble
> +:stem:
> +
> +* Author: {author}
> +* Date: {revdate}
> +* Status: {revremark}
> +
> +== Motivation
> +
> +Users are used to working with remote git repositories using the git CLI suite.
> +By implementing a git server which proxies the monorepo we can enable users to
> +interact with Link identities using standard git tooling.
> +
> +== Overview
> +
> +The local view of the network is available in the monorepo as specified in 
> +xref:./0001-identity_resolution.adoc#namespacing[Namespacing].

personal nit: I know we colloquially refer to the storage as the
monorepo, but I've been trying to call it "Link storage" or
"radicle-link storage" so as not to bring any bias/confusion of what
people usually call monorepos.

wdyt?

> +
> +To achieve transparent interaction with git remotes which point at radicle
> +projects we expose an SSH server which the git protocol understands and which
> +performs two functions:
> +
> +1. Updating the local peers signed refs on push to a particular URN 
> +2. Exposing remote peers refs in a manner compatible with the ref layout git
> +   expects so that git commands such as `git fetch <remote> tag <tag>` work as
> +   expected.
> +
> +We achieve this by implementing an SSH server which responds to git requests
> +such that remotes of the form
> +
> +[source]
> +----
> +ssh://<host>/<urn>.git
> +----
> +
> +Will work as expected. SSH URLs in git are handled by connecting to the server
> +and making an `exec` request (<<ssh-protocol-exec-request>>) for either 
> +
> +* `git-upload-pack <url>` in the case of fetching
> +* `git-receive-pack <url>` in the case of pushing
> +
> +The `gitd` SSH server intercepts these and forwards them to a subprocess which
> +runs either `git-upload-pack` or `git-receive-pack` in the monorepo with an
> +additional `--namespace <urn>` so that only refs from the namespace in question
> +are exposed. `gitd` then intercepts protocol messages running over the proxied
> +standard input and output channels and rewrites refs so that the refs for each
> +individual peer under the URN are in the conventional layout git expects. See
> +<<appendix_bad_ref_layout>> for why the ref rewriting is necessary.
> +
> +In order to update the signed refs on pushes the `gitd` obtains the signing key
> +from a running SSH agent.
> +
> +== Terminology and Conventions
> +
> +The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
> +"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
> +"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119>>
> +and <<RFC8174>> when, and only when, they appear in all capitals, as shown here.
> +
> +== Gitd SSH Interface
> +
> +The gitd server exposes an SSH server. Connections to the SSH server MUST be
> +authenticated using <<ssh-protocol-publickey>>. Once connected and authenticated
> +the SSH server accepts `exec` requests <<ssh-protocol-exec-request>> with a
> +command of either:
> +
> +* `git-upload-pack [<peer>@]rad:git:<urn>.git`
> +* `git-receive-pack <urn>.git`

Something I'd find useful for re-reading this, and probably new people
coming to this, is stating why `<peer>` is included one way and not
the other.

I always have to do some mental gymnastics as to which way `fetch` and
`push` are going for `upload` vs `receive`. I believe this is because
I'm thinking in terms of the `git` porcelain commands -- so if I'm
pushing, I naturally think I'm uploading but of course it's the server
that's *receiving*.

> +
> +Where `<peer>` is the base32-z encoded bytes of a peer ID and `<urn>` is a
> +base32-z encoded link URN. If the command does not match either of these
> +patterns `gitd` MUST respond with a `SSH_MESSAGE_CHANNEL_CLOSE` message. The
> +`gitd` server MAY first send a UTF-8 encoded string describing the error as an
> +extended data message with a `data_type_code` of `1`.
> +
> +The `gitd` server then invokes one of the following commands and proxies stdout
> +and stderr through to the subprocess. The proxied stdin and stdout are subject
> +to <<ref-rewriting>>.
> +
> +=== `git-upload-pack [peer@]rad:git:<urn>.git`
> +
> +There are two versions of this command due to older versions of
> +`git-upload-pack` not handling namespaces correctly, see
> +<<git-upload-pack-bad-namespace>>.
> +
> +==== `git --version >= 2.34.0`
> +
> +The invoked command MUST be
> +
> +[source,bash]
> +----
> +git upload-pack \
> +    --namespace <urn> \
> +    -c transfer.hiderefs=refs/remotes \
> +    -c transfer.hiderefs=refs/remotes/rad \
> +    -c transfer.hiderefs=refs/remotes/cobs \
> +    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
> +----
> +<1> This line is repeated for each visible ref in each remote in the namespace.
> +A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
> +everything except the `rad` and `cobs` category of refs under the remote.
> +
> +==== `git --version < 2.34.0`
> +
> +The invoked command MUST be
> +
> +[source,bash]
> +----
> +git upload-pack \
> +    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes \ <1>
> +    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/rad \
> +    -c transfer.hiderefs=refs/namespaces/<urn>/refs/remotes/cobs \
> +    -c uploadpack.hiderefs=!^$UNHIDDEN <2>
> +----
> +<1> Note that in contrast to the invocation for `git >= 2.34.0` we must include
> +the `refs/namespaces/<urn>` prefix. Here the `<urn>` component is the base32-z
> +encoded URN.
> +<2> This line is repeated for each visible ref in each remote in the namespace.
> +A visible ref is one which matches `refs/remotes/<remote>/^(rad|cobs)`. I.e.
> +everything except the `rad` and `cobs` category of refs under the remote.
> +
> +=== `git-receive-pack rad:git:<urn>.git`
> +
> +The invoked command MUST be
> +
> +[source,bash]
> +----
> +git upload-pack \
> +    --namespace <urn> \
> +    -c transfer.hiderefs=refs/remotes \
> +    -c transfer.hiderefs=refs/remotes/rad \
> +    -c transfer.hiderefs=refs/remotes/cobs \
> +    -c uploadpack.hiderefs=!^$UNHIDDEN <1>
> +----

Three things:
1. This says `upload-pack`
2. Shouldn't the versioning matter here too? Or perhaps it wouldn't if
it's receive-pack because it's coming from the working copy?
3. There's no explanation corresponding to `<1>` here.

> +
> +When the server receives an `exec` request for a `git-receive-pack` request it
> +MUST ensure that the authenticated public key for the request is the public key
> +corresponding to the signing key of the monorepo it proxies. If the public key
> +does not match the server MUST respond with an `SSH_MESSAGE_CHANNEL_CLOSE` and
> +MAY first send a UTF-8 encoded string describing the error as an extended data
> +message with a `data_type_code` of `1`.
> +
> +Once the subprocess has completed `gitd` MUST attempt to update the signed refs
> +for the namespace in question. To do this `gitd` attempts to retrieve a key from
> +the SSH agent running at `$SSH_AUTH_SOCK`. If this is not possible then `gitd`
> +MUST report an error as an extended data messaage with a `data_type_code` of `1`.
> +
> +=== Environment variables
> +
> +If the client issues a channel request of type `"env"` before sending an `exec`
> +request then `gitd` MUST store the associated name and value and pass those
> +values into the environment of invoked subprocesses for that channel.
> +
> +
> +[#ref-rewriting]
> +== Peer URLs and ref rewriting
> +
> +Once the `gitd` has started a git subprocess and is proxying data from the SSH
> +client to the subprocess then the remaining responsibility of `gitd` is to
> +intercept the git protocol messages running over the proxied streams and rewrite
> +some refs. Concretely, if the URL that was passed to the `exec` command was of
> +the form `<peer>@rad:git:<urn>.git` (it contains a peer ID) then `gitd` MUST
> +rewrite refs as follows, otherwise `gitd` MUST NOT rewrite refs.
> +
> +=== Rewrite Rules
> +
> +In abstract the rewriting `gitd` must perform is one of the following rules:
> +
> +* The incoming rule :: When sending data to the `git` subprocess if the incoming
> +  (_from_ the `SSH` client) ref matches `refs/<remainder>` it MUST be rewritten
> +  to `refs/remotes/<peer_id>/<remainder>` before passing to the `git` subprocess
> +* The outgoing rule :: When receiving data from the `git` subprocess, if the
> +  outgoing (_to_ the `SSH` client) ref matches `refs/remotes/<peer
> +  id>/<remainder>` it MUST be rewritten to `refs/<remainder>`.
> +
> +The following sections specify specifically what parts of the git protocol
> +messages must be rewritten for each command.

nit: if you're specifying I don't think you need to say you're
specifically doing so :)

> +
> +=== Upload pack
> +
> +After starting the `git-upload-pack` subprocess `gitd` intercepts the first
> +PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
> +pass the line through verbatim to the `SSH` client and proceed as according to
> +<<protocol-v2-rewriting>>.
> +
> +If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
> +through verbatim to the `SSH` client and continue as per
> +<<v1-reference-discovery-rewriting>>.
> +
> +If the first line is neither of the above then it is the first line of reference
> +discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.
> +
> +Once the reference discovery step is complete all remaining input and output is
> +proxied without modification.
> +
> +=== Receive Pack
> +
> +After starting the `git-receive-pack` subprocess `gitd` intercepts the first
> +PKT-LINE of output. If the line is `PKT-LINE("version 2" LF)` then `gitd` MUST
> +pass the line through verbatim to the `SSH` client and proceed as according to
> +<<protocol-v2-rewriting>>.
> +
> +If the first line is `PKT-LINE("version 1" LF)` then `gitd` MUST pass the line
> +through verbatim to the `SSH` client and continue as per
> +<<v1-reference-discovery-rewriting>>.
> +
> +If the first line is neither of the above then it is the first line of reference
> +discovery and `gitd` MUST proceed as per <<v1-reference-discovery-rewriting>>.
> +
> +Once reference discovery is complete the `SSH` client process will send
> +reference update requests as per <<git-protocol-reference-update-request>>.
> +`gitd` MUST execute the following pseudocode:
> +
> +[source]
> +----
> +loop 
> +    let next_line = read_pkt_line_from_client()
> +    if next_line is flush packet
> +        send_to_subprocess(flush_packet)
> +        break
> +    else
> +        if next_line is command <1>
> +            rewritten = <rewrite refname in command according to incoming rule>
> +        else
> +            rewritten = next_line
> +        send_to_subprocess(rewritten)
> +----
> +<1> A command is a packet line which matches `<oid> SP <oid> SP name`
> +
> +Once this loop is complete `gitd` MUST proxy all further input and output
> +without modification.
> +
> +[#v1-reference-discovery-rewriting]
> +=== V1 Reference Discovery Rewriting
> +
> +In both `git-upload-pack` and `git-receive-pack` the subprocess begins by
> +outputting all the references it knows about as per  the grammer under "Reference
> +Discovery" in <<<git-protocol-v1>>> which is repeated verbatim here:
> +
> +[source]
> +----
> +  advertised-refs  =  *1("version 1")
> +		      (no-refs / list-of-refs)
> +		      *shallow
> +		      flush-pkt
> +
> +  no-refs          =  PKT-LINE(zero-id SP "capabilities^{}"
> +		      NUL capability-list)
> +
> +  list-of-refs     =  first-ref *other-ref
> +  first-ref        =  PKT-LINE(obj-id SP refname
> +		      NUL capability-list)
> +
> +  other-ref        =  PKT-LINE(other-tip / other-peeled)
> +  other-tip        =  obj-id SP refname
> +  other-peeled     =  obj-id SP refname "^{}"
> +
> +  shallow          =  PKT-LINE("shallow" SP obj-id)
> +
> +  capability-list  =  capability *(SP capability)
> +  capability       =  1*(LC_ALPHA / DIGIT / "-" / "_")
> +  LC_ALPHA         =  %x61-7A
> +----
> +
> +`gitd` starts by parsing the first line. The ref in the first line MUST be
> +rewritten as per the outgoing rewrite rule. If there is a `symref` capability in
> +the `capabilities` list (<<git-protocol-symref-capability>>) then `gitd` MUST
> +rewrite the ref in the `symref` as per the outgoing rewrite rule. This rewritten
> +packet line must then be sent to the `SSH` client.
> +
> +Once this first line is complete `gitd` MUST execute the following algorithm
> +
> +[source]
> +----
> +loop 
> +    let next_line = read_pkt_line_from_subprocess()
> +    if next_line is flush packet
> +        send_to_ssh_client(flush_packet)
> +        break
> +    else
> +        if next_line is other-ref
> +            rewritten = <rewrite refname in next_line according to outgoing rule>
> +        else
> +            rewritten = next_line
> +        send_to_ssh_client(rewritten)
> +----
> +
> +Once this loop terminates the reference discovery step is complete.
> +
> +[#protocol-v2-rewriting]
> +=== Protocol v2
> +
> +Protocol v2 is defined in <<<git-protocol-v2>>>. Protocol v2 is defined in terms
> +of commands which are sent by the client (the `SSH` client here) to the server
> +(the subprocess). The grammar in <<git-protocol-v2>> is repeated verbatim here:
> +
> +[source]
> +----
> +request = empty-request | command-request
> +empty-request = flush-pkt
> +command-request = command
> +    capability-list
> +    delim-pkt
> +    command-args
> +    flush-pkt
> +command = PKT-LINE("command=" key LF)
> +command-args = *command-specific-arg
> +----
> +
> +While the client has an open connection to `gitd` then `gitd` MUST attempt to
> +read the next `command` `PKT-LINE` from the `SSH` client. For each command:
> +
> +* If the `command` is `ls-refs` then proceed as according to
> +  <<protocol-v2-ls-refs>>
> +* If the `command` is `fetch` then proceed as accoding to <<protocol-v2-fetch>>
> +* Otherwise `gitd` MUST read the remainder of the command and pass the whole
> +  `command-request` through to the subprocess. `gitd` MUST then read from the
> +  subprocess until a flush packet is read passing everything through to the
> +  `SSH` client
> +
> +[#protocol-v2-ls-refs]
> +==== `ls-refs`
> +
> +`gitd` MUST parse the command arguments of the `ls-refs` command. For each
> +`ref-prefix` argument `gitd` MUST rewrite the ref according to the incoming
> +rewrite rule. Once this rewriting is complete the entire command MUST be passed
> +to the subprocess. 
> +
> +The subprocess will now respond with the following:
> +
> +[source]
> +----
> +output = *ref
> +  flush-pkt
> +obj-id-or-unborn = (obj-id | "unborn")
> +ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF)
> +ref-attribute = (symref | peeled)
> +symref = "symref-target:" symref-target
> +peeled = "peeled:" obj-id
> +----
> +
> +`gitd` MUST read from the subprocess until a flush packet is received executing
> +the following pseudocode
> +
> +[source]
> +----
> +loop
> +    let next_line = read_pkt_line_from_subprocess()
> +    if line is flush
> +        send_to_subprocess(line)
> +        break
> +    if line is ref
> +        rewritten = PKT_LINE(obj-id-or-unborn SP rewrite(refname) SP rewrite(attributes) LF) <1> <2>
> +    else
> +        rewritten = next_line
> +    send_to_subprocess(rewritten)
> +----
> +<1> `rewrite(refname)` means rewrite `refname` according to the outgoing rewrite
> +    rule
> +<2> `rewrite(attributes)` means for each attribute in the attributes, if the
> +    attribute is a `symref` then rewrite `symref-target` according to the outgoing
> +    rewrite rule
> +
> +==== `fetch`
> +
> +`gitd` MUST parse the command arguments of the fetch command. For each argument,
> +if the argument name is `want-ref` then the argument value MUST be rewritten
> +according to the incoming rewrite rule, otherwise the argument must be left as
> +is. Once this rewriting is complete the command MUST be passed to the
> +subprocess.
> +
> +Once the command has been sent to the subprocess `gitd` MUST execute the
> +following pseudocode to rewrite the `wanted-refs` section of the response:
> +
> +[source]
> +----
> +loop
> +    let next_line = read_pkt_line_from_client()
> +    if next_line is PKT-LINE("wanted-refs")
> +        loop
> +            let next_ref = read_pkt_line_from_client()
> +            if next_ref is delimiter_packet
> +                send_to_subprocess(delimiter_packet)
> +                break
> +            let rewritten = rewrite(next_ref) <1>
> +            send_to_subprocess(rewritten)
> +    else if next_line is flush_packet
> +        send_to_subprocess(next_line)
> +        break
> +    else
> +        send_to_subprocess(next_line)
> +----
> +<1> The `wanted-ref` argument has the form `obj-id SP refname`. Rewriting this
> +    means rewriting the refname according to the incoming rewrite rule.
> +
> +Once this loop is complete the command handling is complete.
> +
> +[appendix]
> +[[appendix_bad_ref_layout,Appendix A: Ref Layout Mismatch]]
> +== Ref Layout Mismatch
> +
> +Why do we need to do ref rewriting? Imagine a `gitd` running at `127.0.0.1:9999`
> +which does everything specified here (specifically wrapping git commands and
> +calling them in a monorepo with a `--namespace` argument) but which _does not_ 
> +rewrite refs. Given such a `gitd` the following URL will provide all refs under
> +a given namespace
> +
> +[source]
> +----
> +ssh://127.0.0.1:9999/rad:git:<encoded namespace>
> +----
> +
> +We can then create remotes like this:
> +
> +[source]
> +----
> +[remote "collaborator"]
> +	url = ssh://127.0.0.1:9999/rad:git:<urn>
> +	fetch = +refs/remotes/<peer id>/heads/*:refs/remotes/collaborator/*
> +----
> +
> +`git fetch` will do the right thing here and fetch all the remote branches into
> +`refs/remotes/collaborator/*`. Unfortunately commands which reference a
> +particular branch or tag will not do the right thing. For example, `git fetch
> +collaborator mybranch` will attempt to fetch `refs/heads/<mybranch>`, which
> +doesn't exist. This is due to the following lines from the git fetch docs
> +<<git-fetch-docs>>.
> +
> +[quote]
> +When `git fetch` is run with explicit branches and/or tags to fetch on the
> +command line, e.g. `git fetch origin master`, the ``<refspec>``s given on the
> +command line determine what are to be fetched (e.g. `master` in the example,
> +which is a short-hand for `master:`, which in turn means "fetch the master
> +branch but I do not explicitly say what remote-tracking branch to update with it
> +from the command line"), and the example command will fetch only the master
> +branch. The `remote.<repository>.fetch` values determine which remote-tracking
> +branch, if any, is updated. When used in this way, the
> +`remote.<repository>.fetch` values do not have any effect in deciding what gets
> +fetched (i.e. the values are not used as refspecs when the command-line lists
> +refspecs); they are only used to decide where the refs that are fetched are
> +stored by acting as a mapping.
> +
> +This behaviour doesn't appear to be configurable, there is no way to tell git
> +that `git fetch <remote> <branch>` should fetch `refs/remotes/<peer
> +id>/refs/heads/<branch>` and likewise no way to say that `git fetch <remote> tag
> +<tag>` should fetch `refs/remotes/<peer id>/refs/tags/<tag>`. However, we do
> +control the `gitd` process, so we can make `gitd` rewrite refs to achieve the
> +same thing.
> +
> +
> +[bibliography]
> +== References
> +
> +* [[[git-fetch-docs]]] https://git-scm.com/docs/git-fetch#_configured_remote_tracking_branches
> +* [[[RFC2119]]] https://www.rfc-editor.org/rfc/rfc2119>
> +* [[[RFC8174]]] https://www.rfc-editor.org/rfc/rfc8174>>
> +* [[[git-protocol-v1]]] https://git-scm.com/docs/pack-protocol
> +* [[[git-protocol-common]]] https://git-scm.com/docs/protocol-common
> +* [[[git-protocol-v2]]] https://www.git-scm.com/docs/protocol-v2
> +* [[[git-protocol-capability-advertisment]]] https://git-scm.com/docs/protocol-v2#_capability_advertisement
> +* [[[git-protocol-symref-capability]]] https://git-scm.com/docs/protocol-capabilities#_symref
> +* [[[git-protocol-reference-update-request]]] https://git-scm.com/docs/pack-protocol#_reference_update_request_and_packfile_transfer
> +* [[[ssh-protocol-publickey]]] https://datatracker.ietf.org/doc/html/rfc4252#section-7
> +* [[[ssh-protocol-exec-request]]] https://datatracker.ietf.org/doc/html/rfc4254#section-6.5
> +* [[[git-upload-pack-bad-namespace]]] https://lore.kernel.org/git/CD2XNXHACAXS.13J6JTWZPO1JA@schmidt/
> -- 
> 2.37.0
Reply to thread Export thread (mbox)