~radicle-link/dev

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
18 3

[PATCH] Add an RFC for a patch data structure

Details
Message ID
<2e5259fc2e7ddfae5598653317c88595e4314dc7.1634304773.git.alex@memoryandthought.me>
DKIM signature
missing
Download raw message
Patch: +341 -0
This RFC reviews the link teams experiences with mailing list
development and proposes a patch data structure which should be feasible
for distributed development based on Radicle.

This is currently in a very preliminary state. I'm not sure about a lot
of the conclusions or reasoning. My objective is to get to a data
structure that I can start building vim plugins and CI tooling on top
of. Please criticise heavily.

---
 docs/rfc/0700-patch-data-structure.adoc | 341 ++++++++++++++++++++++++
 1 file changed, 341 insertions(+)
 create mode 100644 docs/rfc/0700-patch-data-structure.adoc

diff --git a/docs/rfc/0700-patch-data-structure.adoc b/docs/rfc/0700-patch-data-structure.adoc
new file mode 100644
index 00000000..8da6e327
--- /dev/null
+++ b/docs/rfc/0700-patch-data-structure.adoc
@@ -0,0 +1,341 @@
= RFC: A data structure for patches
:author: @alexjg
:revate: 2021-10-12
:revmark: draft
:toc:
:source-highlighter: highlight.js

== Motivation

We've been using a mailing list for managing contributions for the last month.
The email based workflow has a number of attractive aspects to it - not least of
which is that it more naturally fits the distributed nature of the Radicle
network - but it depends heavily on a detailed understanding of Git and email,
requires a lot of fairly manual work, and exposes many opportunities for human
error.

In this RFC we introduce a data structure which models the workflow of patch
based contribution as seen on a mailing list. By defining this data more
precisely we hope to make building tooling to work with this workflow easier,
which should make patch based workflows easier, more accessible, and more fun.

We start by introducing the things we learned from the patch based workflow
we've been using on the mailing list. We then imagine how we would like to
improve the tooling available to do this. Finally we propose a data structure
which supports these improved tools.

NOTE: The intention is that ultimately this data structure would be realised as a
collaborative object and distributed along with the radicle monorepo. However,
from the perspective of most tooling the storage of the data could be seen as an
implementation detail so we avoid talking about it in this RFC.

== The mailing list workflow

The most salient thing about the mailing list based workflow we adopted is that
there is no central version of the code which is considered current. Instead
each week we nominate one of the core contributers of the project to be "the
maintainer". The maintainer sends an email to the mailing list at the beginning
of their cycle stating where they are publishing their `"master"` branch of the
project. Throughout the week the maintainer merges patches and pushes to this
location. In principle there is no reason this maintainer couldn't be a machine
(modulo some complicated email parsing logic) but so far they've been a human.

In this workflow then, there are two roles: core contributors, and the lead
maintainer. Core contributors submit patches via email and review patches via
email. The lead maintainer integrates these patches, ensures that tests pass,
and merges patches to a `"master"` branch at a published location.

Patches are "append only" in the sense that because they are sent by email there
is no way to amend a patch (in contrast to GitHub PRs). Instead when one wants
to update a patch in response to feedback you email a new version of that patch
in reply to the initial patch - this is known as a re-roll.

=== The good and the bad

The first thing everyone notices is that there is a lot of work to do to get
your tools set up. You need an email client which supports threading and plain
text. You probably also need to setup a bunch of tagging and filtering systems
to manage the large number of emails you will receive. On top of this there is a
lot of manual work in using `git format-patch` and `git send-email` correctly so
that they produce nice threaded email, and in including the correct tags so that
people know where to get your code. This is table stakes, once set up I think
these are the takeaways:

The Good::

    * Surfacing commit messages and diffs on a per commit basis encourages
      contributors to organise their commits in order to make them more
      comprehensible
    * Re-rolls make it explicit when a patch has changed, with some explanation as
      to what the change was. Contrast with GitHub PRs which silently change under
      you
    * Not having to switch context (out of the terminal into the browser) makes
      reviewing small changes very pleasant (but see below re. larger changes).

The Bad::

    * Managing the various in flight patches and integrating with email requires
      a deep understanding of git and of email and lots of setting up of command
      line mail clients
    * There's a lot of manual work in creating and submitting patches, it's very
      easy to get this wrong and flood everyone's inbox
    * Managing large reviews is quite painful. Keeping track of open discussions
      is complicated. For small patches one can just review in your mail client.
      But for larger patches you typically check the code out and examine the
      changes. At this point it is very tedious to keep switching from code to
      email client and back.

My takeaway is the the underlying model of immutable patches with well crafted
commits and commit messages and nested discussion threads is a better model
for contribution than GitHub's PR; but the current tools are absurdly hard to
use. I believe that this is because email is a difficult medium to work with
programmatically. On that basis, let's imagine what good tools might look like.

== Better tools for patches

Good tools don't require you to leave your workbench. Most developers work in a
particular context - an editor, a terminal, a browser based IDE, there are many.
We must design tools which can easily be adapted to all these contexts. Email is
not that.

To motivate this discussion I am going to imagine a few different contexts which
must be supported by the approach we choose:

* An editor plugin (substitute your favourite editor or IDE here)
* Upstream and/or browser based code review tools
* Automated maintainers (i.e. CI)
* A bridge to GitHub PRs or GitLab MRs

All of these must be possible with very little domain knowledge on the part of
the application developer - that is to say, if you know how to write a web app,
you should be able to write a browser based code review tool which interoperates
with a VS code plugin.

=== In the editor

Most of the work of submitting a patch is in creating a neat commit trail where
each commit is reasonably self contained and commit messages do a good job of
explaining what has happened. One can imagine editor tools which help with this,
but they are mainly about how you use git, not about patches in particular.

The interesting work in the editor is when you're reviewing code. The editor is
a natural place to do this for many developers because it's where their tools
for navigating the codebase are most sophisticated. I imagine a plugin which
allows me to list currently outstanding patches and on selecting a patch it
allows me to see the files which are changed and to navigate through the diff of
that change in the editor. The diff is displayed in the editor as a split screen
with comments on the patch available inline with the diff. I can write comments
directly in my editor. This means all your normal tools for jumping to
definitions, syntax highlighting etc. are available. 

Approving a patch should just be a question of adding a flag to the patch.

=== In Upstream

Upstream is able to show a list of outstanding patches with a similar diff and
conversation UI to GitHub. Creating a patch is also similar to GitHub.

=== Automated maintainers

An automated maintainer doesn't need to be able to review code, or submit
patches. But they do need to be able to unambiguously see what the new code for
a particular patch is, run tests, and report the results of those tests to other
maintainers. They also need to be able to merge the patch when some number of
approvals (and other constraints) has been reached.

=== Bridges

Ideally we want this to be a two way bridge. There is a machine somewhere which
provides a bridge to a centralised code collab service. This machine has a
maintainer key to the project, and also has API access to GitHub.  Building
bridges is a whole other project, but we want to ensure that it is at least
feasible.

Whenever a PR is submitted on GitHub the bridge creates a patch in the Radicle
repository with itself as the author, but with additional details that allow
Radicle applications to know that it is a bridged patch. The bridge creates a
tag in it's own remote for each PR and attaches that to the patch.

Every time the PR changes on GitHub due to code being pushed the bridge creates
a new tag and updates the Radicle patch.

When a user comments on radicle, the bridge adds a comment to the github PR.
When a user comments on GitHub, the bridge adds a comment to the patch.

== The patch data structure

Our design goals are:

* Preserve the ability to submit a patch from anywhere that speaks git
* Represent multiple versions of a patch as separate series of commits
* Allow posting comments that reference particular parts of the code 
* Allow flexible mechanisms for approving or rejecting patches
* Be easy to integrate into existing tools and workflows
* Allow automated tools to run against a patch, report results, and perform
  merges

At a high level then we are imagining a patch as a series of versions, each
version superceding a previous version. Each version of a patch has a set of
commits, one or more authors, a cover letter, a set of comment trees (more on
this shortly) and a set of approval or rejection votes. The patch in its
entirety has a status: open or rejected. If the patch is merged there
is a reference to the version which was merged and the commit which merged it.
If the patch is rejected there is an optional reason why. A patch also has a
target branch.

=== Identifying commits

Mailing lists transport the code you are submitting directly in the email. This
allows people to submit code without hosting a repository somewhere and it also
makes it easy to read and reply to the code using anything that speaks email. We
may not want to take quite the same approach.

In practice we have published the code for a patch at a well known network
location (identified by a `Published-At` trailer in the patch cover letter) and
typically use that location to pull code rather than using `git am` to apply
code from the patch email. This suggests that much of the time it is 
unnecessary to ship the code directly in the patch, instead the patch can just
contain a start and end commit, and a network location of a git repository where
those commits can be found. Tooling can then pull this code to the local
repository and display a diff. This raises two questions; firstly how do people
who don't want to run a publicly available git server submit patches, secondly
what happens if the git repository goes away?

To the first point we can allow people to submit patches which actually contain
the commits in question using `git bundle`. We don't do this by default because
whatever repliaction mechanism we use for patches we would like to avoid
replicating the code using that mechanism.

To the latter point, we can imagine that an automated maintainer may be running
which automatically fetches the contents of any new patch, imports them into
radicle and adds a URN containing the commits to the patch. The automated
maintainer can continue to host these commits for archival purposes so that even
if the orignal URL goes away the patch is still available.

=== Patch status

// TODO: I am very unsure of this, more thought needed

The open or closed state of a patch exists purely to allow maintainers to filter
out patches which have been reviewed and rejected in the UI. The merge state of
a patch is independent of the open or closed state and is represented by
references to merge commits in the patch version. This is because the lifecycle
of the patch metadata is separate to the lifecycle of the repository(ies).
Different maintainers could merge the patch into their remotes at different
times. Therefore merges are represented as a reference to a commit and a remote,
each patch version can have multiple merges associated with it. Figuring out the
merge status is then a question for the local tooling - is the merge commit
on the target branch in our local repository?


=== Comments

Comments on a patch may reference code in the patch. Comments are also tree
like. That is, we allow arbitrary depth threading - in contrast to GitHub's flat
comment hierarchy.

To allow code references a comment may refer to code that it is commenting on by
including a commit, blob hash, and line number. A comment also has a parent
comment which it is replying to.

=== Automated tools

Much of the work of an automated maintainer is made possible just by identifying
the code and patch statuses unambiguously. However, there is the question of how
automated tooling can report the results of test runs. For this purpose we
propose that each patch version allow storing arbitrary JSON along with the core
attributes specified in this RFC. This allows CI tools to store test results and
other metadata for each version.

=== Specification

We specify the patch data structure in two forms. First the Rust data structures
we would use to represent a patch. Second a JSON schema which all
representations of a patch must be isomorphic to.

==== Rust

[source,rust]
----
struct Patch {
    authors: Vec<Author>,
    status: PatchStatus,
    versions: Vec<PatchVersion>,
}

enum Author {
    Urn(Urn),
    Email(String),
    Other(String),
}

enum PatchStatus {
    Open,
    Rejected{reason: String},
}


struct PatchVersion {
    version_number: u64,
    cover_letter: Text,
    date: chrono::DateTime<Utc>,
    code: CommitRange,
    comments: Vec<CommentTree>,
    ext: HashMap<String, serde_json::Value>,
    merges: Vec<Merge>,
    votes: HashMap<Urn, Vote>
}

enum Vote {
    Approve,
    Reject,
}

struct CommitRange {
    location: CodeLocation,
    start_commit: radicle_git_ext::Oid,
    end_commit: radicle_git_ext::Oid,
}

enum CodeLocation {
    Bundle(Vec<u8>),
    Git(url::Url),
    Radicle(Urn),
}

struct CommentTree {
    comment: Comment,
    children: Vec<CommentTree>
}

struct Comment {
    author: Urn,
    text: Text,
    code_reference: Option<CodeReference>,
    time: chrono::DateTime<Utc>,
}

struct CodeReference {
    commit: radicle_git_ext::Oid,
    blob: radicle_git_ext::Oid,
    lines: LineRange,
}

struct LineRange {
    start: u64,
    end: Option<u64>,
}

enum Text {
    Plain(String),
    Markdown(String),
    AsciiDoc(String),
    Other{raw: String, mime_type: String},
}

struct Merge {
    remote: String,
    time: chrono::DateTime<Utc>
    commit: radicle_git_ext::Oid,
}
----
-- 
2.33.0
Details
Message ID
<CAH_DpYRdPUYEUmtGHk6FX_4DYbq1xw+QS8d_xK4PWNzcrfxNnQ@mail.gmail.com>
In-Reply-To
<2e5259fc2e7ddfae5598653317c88595e4314dc7.1634304773.git.alex@memoryandthought.me> (view parent)
DKIM signature
missing
Download raw message
Published-At: https://github.com/alexjg/radicle-link/tree/patches/rfc-patch-data-structure%2Fv1
Details
Message ID
<CF03JV351O47.3ATSL1YJUYG6D@haptop>
In-Reply-To
<2e5259fc2e7ddfae5598653317c88595e4314dc7.1634304773.git.alex@memoryandthought.me> (view parent)
DKIM signature
pass
Download raw message
I would like to note here that I am PSYCHED!

On Fri Oct 15, 2021 at 2:35 PM IST, Alex Good wrote:
> +=== Patch status
> +
> +// TODO: I am very unsure of this, more thought needed
> +
> +The open or closed state of a patch exists purely to allow maintainers
> to filter
> +out patches which have been reviewed and rejected in the UI. The merge
> state of
> +a patch is independent of the open or closed state and is represented
> by
> +references to merge commits in the patch version. This is because the
> lifecycle
> +of the patch metadata is separate to the lifecycle of the
> repository(ies).
> +Different maintainers could merge the patch into their remotes at
> different
> +times. Therefore merges are represented as a reference to a commit and
> a remote,
> +each patch version can have multiple merges associated with it.
> Figuring out the
> +merge status is then a question for the local tooling - is the merge
> commit
> +on the target branch in our local repository?

I think something that will be annoying is that for a tool to determine
if a patch is closed or open, it would have to load the COB and look at
the status. But maybe this can be fixed a layer up -- for example in the
monorepo we name the accepted/rejected references and so we can filter
the refs using those special names.

Also if a patch series is merged you shouldn't really expect to see a
new version, right? You could, however, imagine seeing new comments
being added in case someone found something strange post-merge. But
you're right it can be merged in multiple places, many of which might be
a canonical view, some of which might not be.

> +==== Rust
> +
> +[source,rust]
> +----
> +struct Patch {
> + authors: Vec<Author>,
> + status: PatchStatus,
> + versions: Vec<PatchVersion>,
> +}
> +
> +enum Author {
> + Urn(Urn),
> + Email(String),
> + Other(String),
> +}
> +
> +enum PatchStatus {
> + Open,
> + Rejected{reason: String},
> +}

Can we "Accept" patches? :)

Also, how does a Patch become Rejected? What if it's merged in one or
more remotes?

> +struct PatchVersion {
> + version_number: u64,
> + cover_letter: Text,
> + date: chrono::DateTime<Utc>,
> + code: CommitRange,
> + comments: Vec<CommentTree>,
> + ext: HashMap<String, serde_json::Value>,
> + merges: Vec<Merge>,
> + votes: HashMap<Urn, Vote>
> +}

Shouldn't there be a "target" reference? I think that's something you
mentioned above. I wonder if the target might also want to include the
base commit saying what commit the author(s) of the series were basing
their work off. I'm thinking this because I could say, "Hey I'm basing
my work off `hynsroshala/heads/main` but that might have changed once
you have received my patch. Could be a way to detect "needs
merge/rebase".

> +enum CodeLocation {
> + Bundle(Vec<u8>),
> + Git(url::Url),
> + Radicle(Urn),
> +}

Does this include the path to the reference in the case of Radicle and
Git?

[PATCH] RFC: patches

Details
Message ID
<20211109143906.42539-1-kim@eagain.st>
In-Reply-To
<2e5259fc2e7ddfae5598653317c88595e4314dc7.1634304773.git.alex@memoryandthought.me> (view parent)
DKIM signature
pass
Download raw message
Patch: +271 -0
Another take, early draft.

Signed-off-by: Kim Altintop <kim@eagain.st>
---
Published-At: https://git.sr.ht/~kim/radicle-link/refs/patches/rfc/patches/v1

As promised, this tries to avoid much of the blurb for now. It is nevertheless
somewhat lengthy, and also doesn't actually diff against Alex' earlier attempt
(even though it borrows some of its ideas), so I am submitting it as a fresh
patch.


 docs/rfc/0700-patches.adoc | 271 +++++++++++++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100644 docs/rfc/0700-patches.adoc

diff --git a/docs/rfc/0700-patches.adoc b/docs/rfc/0700-patches.adoc
new file mode 100644
index 00000000..6757f78a
--- /dev/null
+++ b/docs/rfc/0700-patches.adoc
@@ -0,0 +1,271 @@
= RFC: Patches
Kim Altintop <kim@eagain.st>; Alexander J. Good <alex@memoryandthought.me>;
:source-highlighter: highlight.js

:revdate: 2021-11-09
:revremark: draft
:toc:
:toc-placement: preamble

* Date: {revdate}
* Status: {revremark}

== Motivation

**BLURB**

== Overview

Commit objects in `git` define the order in which changes are applied to the
state of the repository. This is a fairly curious choice for a distributed
version control system, as different orderings will alter the identity (hash) of
commits, and so two different histories may appear disjoint even though they
describe the same state. The conventional way to integrate changes while
avoiding to alter their identity is to use merge commits (ie. commits with more
than one parent), which only partially addresses the issue because the merge
commit may itself introduce new changes (eg. by resolving conflicts). This means
that later changes which have the merge commit in their ancestry may indeed
depend on it, which can lead to confusing topologies like "criss-cross merges"
when histories are attempted to be integrated with each other concurrently.
Additionally, it is not intuitive how to undo a merge operation without
rewriting the history (see <<undo-merge>>).

In other words, the distribution model of `git` implies _consensus_ on the
commit ordering of some mainline branch. Typically this is achieved by deferring
to a single authoritative human integrator (a maintainer), or a shared
repository granting write access to more than one maintainers.

Both are not entirely satisfying for a system which aims to support
_decentralised_ collaboration: it is fairly common for open source projects to
extend maintainer authority to a group of people. If we need to resort to shared
git servers to enable a practical workflow for these, the case for
decentralisation becomes rather weak.

Interestingly, the task of integrating concurrent changes into an authoritative
history can be automated, provided that checking the correctness of the result
can also be automated with high confidence. This has been popularised through
<<bors>>, created for use by the Rust programming language project. In order to
make this approach work in a decentralised setting (ie. without a central server
running the `bors` bot), we need to eliminate nondeterminism (to the extent
possible).

== Specification

=== Collaborative Object

[source,rust]
----
/// Tag hash
type Revision = Oid;

struct Patch {
    // Head is the most recent revision
    revisions: NonEmpty<Revision>,
    target: Refname,
    cover_letter: Option<Text>,
    comments: Vec<Tree<Comment>>,
    // Denotes who is expected to review. We may make this more granular by
    // specifying who MUST review and who SHOULD review. Maybe for the
    // enterprise edition.
    reviewers: Vec<Person>,
    reviews: Vec<Review>,
    tries: Vec<Try>,
}

/// Possibly automated attempt to merge a `Revision` of the `Path` onto some
/// base commit and then verify the result using the build definition of the
/// project.
struct Try {
    /// Revision that was attempted.
    rev: Revision,
    /// Resulting head commit (possibly a merge commit).
    head: Oid,
    /// The merge base used.
    base: Oid,
    result: Either<Conflict, BuildResult>,
    /// May be a real person or a robot.
    conducted_by: Urn,
}

/// Merge failed.
struct Conflict;

enum BuildResult {
    Pass { output: Url },
    Fail { output: Url }
}

struct Person {
    name: String,
    email: String,
    urn: Option<Urn>,
}

struct Tree<T> {
    root: T,
    children: Vec<Tree<T>>,
}

struct Review {
    by: Person,
    verdict: Vote,
    at: DateTime<Utc>,
}

enum Vote {
    Veto,
    NeedsWork,
    Lgtm,
}

struct Comment {
    author: Urn,
    content: Text,
    rel: Rel,
    time: DateTime<Utc>,
}

struct Text {
    content: String,
    mime: ContentType,
}

enum ContentType {
    Plain,
    Markdown,
    AsciiDoc,
}

struct Rel {
    rev: Revision,
    loc: Option<SourceLoc>,
}

struct SourceLoc {
    blob: Oid,
    line: Option<Range<usize>>,
}
----

=== Patch Revisions

For the purpose of this document, a "patch" is conceptually similar to a "topic
branch": a series of one or more commits made on top of some commit within the
ancestry graph of the project's main branch.

To submit a patch for consideration, a tag object ("annotated tag") is created
and anchored at a ref of the form:

    refs/patches/<cob id>/<sequence number>

Where `<cob id>` is the object id of the corresponding `patch` collaborative
object (cob), and `<sequence number>` is a monotonic counter local to the
submitting peer id. Note that the tag is _not_ anchored under the `refs/tags`
hierarchy by default, because this is treated specially by the `git` UI.
Instead, it shall be possible to publish a patch revision as a regular tag _post
hoc_.

The tag's name (the `tag` header) shall be of the form:

    patches/<cob id>/<peer id>/<sequence number>

This structure should be preserved when publishing the tag to a non-`link` tree
(ie. `refs/tags/patches/<cob id>/<peer id>/<sequence number>`).

The tag's message shall contain metadata which allow it to be correlated even if
only the tag object is known, for example:

    Radicle-Cob: <cob id>
    Radicle-Author: <urn>
    Radicle-Peer: <peer id>
    Depends-On: <other cob id>
    ...

The tag _may_ be cryptographically signed, again with the intention to make it
pubishable externally. The signing key _must_ be subject to some external PKI,
and  _must not_ be the `link` device key (nb. the tag is implicitly signed by
the device key via the `signed_refs`).

=== UI

A `patch` cob may be edited using whatever tooling is available, but it can not
exist without at least one patch revision. Thus, to create a `patch` cob, the
submitter must push the commit series to the `link` repo, which shall create the
corresponding cob implicitly.

GitHub users are likely to prefer a familiar workflow, so for them we propose a
dedicated CLI tool which is parametrised by a local topic and a base branch (or
guesses them), and creates both the initial cob and corresponding revision tag.
It may prompt to open a text editor for the cover letter. Once committed, the
tool shall prompt the user to set the upstream for their local branch to

    refs/patches/<cob id>

[NOTE]
Alternatively, we may be able to alias <cob id> with the local branch name.

Subsequent pushes to this upstream ref shall create a new revision tag and
update the cob accordingly.

[NOTE]
Users familiar with the Gerrit Code Review system <<gerrit>>, or more generally
"branchless workflows", will find this procedure rather cumbersome. It is the
hope of the author to provide a UI similar to Gerrit's magical `refs/for` ref,
such that almost all interactions can be carried out using `git push` (cf.
<<gerrit-push>>). Note, however, that this implies to establish a stable patch
id (`Change-Id`), which is somewhat invasive.

=== Mechanical Maintainer

The mechanical maintainer (mm) is envisioned as a `bors`-like program, which
applies patches to their target branch, and verifies the result by running some
build specification (either locally or by triggering a web service). Unlike
`bors` proper, `mm` must produce identical `git` histories on independent runs.
To achieve this, `mm` proceeds as follows:

- collect `patch` cobs for the desired `target`
- filter out the ones whose most recent `revision` is reachable from the tip of
  `target`
- evaluate the review votes, and drop patches which are not deemed to be
  accepted
- sort the remaining revisions topologically
- drop patches which depend on not-yet-accepted ones
- sort the remaining revisions according to the timestamp of the most recent
  approving vote
- apply the revisions in this order to the `target` by creating merge commits
  with a fixed author / commiter (eg. "Mechanical Maintainer mm@radicle") and
  the timestamp of the most recent approving vote. The revision tag must be
  included in the commit as a merge-tag header.
- if a merge fails, update the corresponding cob, prune any revisions which may
  depend on the failed one, and retry with the resulting list (if not empty)
- run the build, collect the result, and update the corresponding cobs
- if the build passes, publish the new `target`
- if the build fails, update the head cob, undo the merge, and retry with the
  remaining history.
- optionally, send out side-channel notifications such as email, irc, slack

=== Further Research

- It might be more practical to explicitly "close" a `patch`, recording the oid
  of the merge commit
- Is there a more compact format for archiving closed patches? We will
  accumulate a lot of refs.
- Depending on the identity the `mm` had assumed, the resulting `target` may not
  be considered a maintainer branch. That is, explicit action is required to
  reset to the `target`. This may or may not be desirable.
- We cannot assume the build to be deterministic, so perhaps the `mm` should
  have two phases, one which tries to integrate, and another one which examines
  the build results from multiple independent runs, and merges based on a
  threshold policy.
- The reachability computations to order patches are not entirely trivial,
  potentially some of the (dependency) information can be pre-computed and
  stored in the cobs.


[bibliography]
== References

* [[[undo-merge]]] https://kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.txt
* [[[bors]]] https://github.com/graydon/bors
* [[[gerrit]]] https://gerritcodereview.com
* [[[gerrit-push]]] https://gerrit-review.googlesource.com/Documentation/user-upload.html#_git_push
--
2.33.1

Re: [PATCH] RFC: patches

Details
Message ID
<20211109154325.GE5024@schmidt.localdomain>
In-Reply-To
<20211109143906.42539-1-kim@eagain.st> (view parent)
DKIM signature
pass
Download raw message
> +Kim Altintop <kim@eagain.st>; Alexander J. Good <alex@memoryandthought.me>;

SCNR

> +[NOTE]
> +Users familiar with the Gerrit Code Review system <<gerrit>>, or more generally
> +"branchless workflows", will find this procedure rather cumbersome. It is the
> +hope of the author to provide a UI similar to Gerrit's magical `refs/for` ref,
> +such that almost all interactions can be carried out using `git push` (cf.
> +<<gerrit-push>>). Note, however, that this implies to establish a stable patch
> +id (`Change-Id`), which is somewhat invasive.

The hope is "in the future", and the change-id is per-commit (which makes it
invasive).

Re: [PATCH] RFC: patches

Details
Message ID
<20211110081256.GB8845@schmidt.localdomain>
In-Reply-To
<20211109143906.42539-1-kim@eagain.st> (view parent)
DKIM signature
pass
Download raw message
> +- apply the revisions in this order to the `target` by creating merge commits
> +  with a fixed author / commiter (eg. "Mechanical Maintainer mm@radicle") and
> +  the timestamp of the most recent approving vote. The revision tag must be
> +  included in the commit as a merge-tag header.

I think I've omitted the crucial part here: a GPG signature on the tag object
itself is fairly useless, unless there is some external PKI. But, because we
preserve the tag as a mergetag header, it would be meaningful to record
signatures over its hash as trailers in the merge commit. That is:

    Reviewed-By: R. E. Viewer <reviewer@example.com> SIGNATURE

where SIGNATURE is BASE58(SIGN(peer.device_key, tag.hash)).

We'd need to be careful though to make clear that those trailers do **not**
apply to the merge commit itself, because the merge operation could've altered
the tree as seen by the reviewers.

Re: [PATCH] RFC: patches

Details
Message ID
<CFMUEL3ASBWP.15IAFRF6T1GOZ@haptop>
In-Reply-To
<20211110081256.GB8845@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Wed Nov 10, 2021 at 7:12 AM GMT, Kim Altintop wrote:
> > +- apply the revisions in this order to the `target` by creating merge commits
> > +  with a fixed author / commiter (eg. "Mechanical Maintainer mm@radicle") and
> > +  the timestamp of the most recent approving vote. The revision tag must be
> > +  included in the commit as a merge-tag header.
>
> I think I've omitted the crucial part here: a GPG signature on the tag
> object
> itself is fairly useless, unless there is some external PKI. But,
> because we
> preserve the tag as a mergetag header, it would be meaningful to record
> signatures over its hash as trailers in the merge commit. That is:
>
> Reviewed-By: R. E. Viewer <reviewer@example.com> SIGNATURE
>
> where SIGNATURE is BASE58(SIGN(peer.device_key, tag.hash)).
>
> We'd need to be careful though to make clear that those trailers do
> **not**
> apply to the merge commit itself, because the merge operation could've
> altered
> the tree as seen by the reviewers.

Is the SIGNATURE here from the mm or from a person? If the latter, how
do we get from that trailer to their verification key?

Re: [PATCH] RFC: patches

Details
Message ID
<20211111113542.GB12311@schmidt.localdomain>
In-Reply-To
<CFMUEL3ASBWP.15IAFRF6T1GOZ@haptop> (view parent)
DKIM signature
pass
Download raw message
On Thu, 11 Nov 2021 09:28:38 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> On Wed Nov 10, 2021 at 7:12 AM GMT, Kim Altintop wrote:
> > > +- apply the revisions in this order to the `target` by creating merge commits
> > > +  with a fixed author / commiter (eg. "Mechanical Maintainer mm@radicle") and
> > > +  the timestamp of the most recent approving vote. The revision tag must be
> > > +  included in the commit as a merge-tag header.
> >
> > I think I've omitted the crucial part here: a GPG signature on the tag
> > object
> > itself is fairly useless, unless there is some external PKI. But,
> > because we
> > preserve the tag as a mergetag header, it would be meaningful to record
> > signatures over its hash as trailers in the merge commit. That is:
> >
> > Reviewed-By: R. E. Viewer <reviewer@example.com> SIGNATURE
> >
> > where SIGNATURE is BASE58(SIGN(peer.device_key, tag.hash)).
> >
> > We'd need to be careful though to make clear that those trailers do
> > **not**
> > apply to the merge commit itself, because the merge operation could've
> > altered
> > the tree as seen by the reviewers.
>
> Is the SIGNATURE here from the mm or from a person? If the latter, how
> do we get from that trailer to their verification key?

An `mm` does not necessarily need to have its own identity, but it could. But
yeah, the trailer would also need to include the URN of the signer.

Re: [PATCH] RFC: patches

Details
Message ID
<CFMVUMA14YX7.2I11FX8IUQU9L@haptop>
In-Reply-To
<20211111113542.GB12311@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Thu Nov 11, 2021 at 10:35 AM GMT, Kim Altintop wrote:
> On Thu, 11 Nov 2021 09:28:38 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > On Wed Nov 10, 2021 at 7:12 AM GMT, Kim Altintop wrote:
> > > > +- apply the revisions in this order to the `target` by creating merge commits
> > > > +  with a fixed author / commiter (eg. "Mechanical Maintainer mm@radicle") and
> > > > +  the timestamp of the most recent approving vote. The revision tag must be
> > > > +  included in the commit as a merge-tag header.
> > >
> > > I think I've omitted the crucial part here: a GPG signature on the tag
> > > object
> > > itself is fairly useless, unless there is some external PKI. But,
> > > because we
> > > preserve the tag as a mergetag header, it would be meaningful to record
> > > signatures over its hash as trailers in the merge commit. That is:
> > >
> > > Reviewed-By: R. E. Viewer <reviewer@example.com> SIGNATURE
> > >
> > > where SIGNATURE is BASE58(SIGN(peer.device_key, tag.hash)).
> > >
> > > We'd need to be careful though to make clear that those trailers do
> > > **not**
> > > apply to the merge commit itself, because the merge operation could've
> > > altered
> > > the tree as seen by the reviewers.
> >
> > Is the SIGNATURE here from the mm or from a person? If the latter, how
> > do we get from that trailer to their verification key?
>
> An `mm` does not necessarily need to have its own identity, but it
> could. But
> yeah, the trailer would also need to include the URN of the signer.

Would the `mm` just have some sort of signing key then, without an
associated identity?

Re: [PATCH] RFC: patches

Details
Message ID
<20211111114353.GD12311@schmidt.localdomain>
In-Reply-To
<CFMVUMA14YX7.2I11FX8IUQU9L@haptop> (view parent)
DKIM signature
pass
Download raw message
On Thu, 11 Nov 2021 10:36:35 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> > > Is the SIGNATURE here from the mm or from a person? If the latter, how
> > > do we get from that trailer to their verification key?
> >
> > An `mm` does not necessarily need to have its own identity, but it
> > could. But
> > yeah, the trailer would also need to include the URN of the signer.
>
> Would the `mm` just have some sort of signing key then, without an
> associated identity?

Maybe. It could be that it has the same identity as you, who is running the
program. Or it could be that it has its own identity. Or it could be that there
is a botnet which manages its own key delegations, and to whose root key(s) a
project delegates to (with a "mechanical maintainer" role).

Re: [PATCH] RFC: patches

Details
Message ID
<CFMWFGGO5RET.1STN6ZFBGPNE2@haptop>
In-Reply-To
<20211111114353.GD12311@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Thu Nov 11, 2021 at 10:43 AM GMT, Kim Altintop wrote:
> On Thu, 11 Nov 2021 10:36:35 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > > > Is the SIGNATURE here from the mm or from a person? If the latter, how
> > > > do we get from that trailer to their verification key?
> > >
> > > An `mm` does not necessarily need to have its own identity, but it
> > > could. But
> > > yeah, the trailer would also need to include the URN of the signer.
> >
> > Would the `mm` just have some sort of signing key then, without an
> > associated identity?
>
> Maybe. It could be that it has the same identity as you, who is running
> the
> program. Or it could be that it has its own identity. Or it could be
> that there
> is a botnet which manages its own key delegations, and to whose root
> key(s) a
> project delegates to (with a "mechanical maintainer" role).

Ok, makes sense b'.'

Re: [PATCH] RFC: patches

Details
Message ID
<CFQHF3QFO02B.391FPPHQL0BKU@haptop>
In-Reply-To
<20211109143906.42539-1-kim@eagain.st> (view parent)
DKIM signature
pass
Download raw message
This is looking good. I dropped some thoughts & questions down :)

On Tue Nov 9, 2021 at 2:39 PM GMT, Kim Altintop wrote:
> Another take, early draft.
>
> Signed-off-by: Kim Altintop <kim@eagain.st>
> ---
> Published-At:
> https://git.sr.ht/~kim/radicle-link/refs/patches/rfc/patches/v1
>
> As promised, this tries to avoid much of the blurb for now. It is
> nevertheless
> somewhat lengthy, and also doesn't actually diff against Alex' earlier
> attempt
> (even though it borrows some of its ideas), so I am submitting it as a
> fresh
> patch.
>
>
> docs/rfc/0700-patches.adoc | 271 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 271 insertions(+)
> create mode 100644 docs/rfc/0700-patches.adoc
>
> diff --git a/docs/rfc/0700-patches.adoc b/docs/rfc/0700-patches.adoc
> new file mode 100644
> index 00000000..6757f78a
> --- /dev/null
> +++ b/docs/rfc/0700-patches.adoc
> @@ -0,0 +1,271 @@
> += RFC: Patches
> +Kim Altintop <kim@eagain.st>; Alexander J. Good
> <alex@memoryandthought.me>;
> +:source-highlighter: highlight.js
> +
> +:revdate: 2021-11-09
> +:revremark: draft
> +:toc:
> +:toc-placement: preamble
> +
> +* Date: {revdate}
> +* Status: {revremark}
> +
> +== Motivation
> +
> +**BLURB**
> +
> +== Overview
> +
> +Commit objects in `git` define the order in which changes are applied
> to the
> +state of the repository. This is a fairly curious choice for a
> distributed
> +version control system, as different orderings will alter the identity
> (hash) of
> +commits, and so two different histories may appear disjoint even though
> they
> +describe the same state. The conventional way to integrate changes
> while
> +avoiding to alter their identity is to use merge commits (ie. commits
> with more
> +than one parent), which only partially addresses the issue because the
> merge
> +commit may itself introduce new changes (eg. by resolving conflicts).
> This means
> +that later changes which have the merge commit in their ancestry may
> indeed
> +depend on it, which can lead to confusing topologies like "criss-cross
> merges"
> +when histories are attempted to be integrated with each other
> concurrently.
> +Additionally, it is not intuitive how to undo a merge operation without
> +rewriting the history (see <<undo-merge>>).
> +
> +In other words, the distribution model of `git` implies _consensus_ on
> the
> +commit ordering of some mainline branch. Typically this is achieved by
> deferring
> +to a single authoritative human integrator (a maintainer), or a shared
> +repository granting write access to more than one maintainers.
> +
> +Both are not entirely satisfying for a system which aims to support
> +_decentralised_ collaboration: it is fairly common for open source
> projects to
> +extend maintainer authority to a group of people. If we need to resort
> to shared
> +git servers to enable a practical workflow for these, the case for
> +decentralisation becomes rather weak.
> +
> +Interestingly, the task of integrating concurrent changes into an
> authoritative
> +history can be automated, provided that checking the correctness of the
> result
> +can also be automated with high confidence. This has been popularised
> through
> +<<bors>>, created for use by the Rust programming language project. In
> order to
> +make this approach work in a decentralised setting (ie. without a
> central server
> +running the `bors` bot), we need to eliminate nondeterminism (to the
> extent
> +possible).
> +
> +== Specification
> +
> +=== Collaborative Object
> +
> +[source,rust]
> +----
> +/// Tag hash
> +type Revision = Oid;
> +
> +struct Patch {
> + // Head is the most recent revision
> + revisions: NonEmpty<Revision>,

Is this equivalent to the v1, v2, etc. that we have been using for the
email-flow?

> + target: Refname,
> + cover_letter: Option<Text>,
> + comments: Vec<Tree<Comment>>,
> + // Denotes who is expected to review. We may make this more granular
> by
> + // specifying who MUST review and who SHOULD review. Maybe for the
> + // enterprise edition.
> + reviewers: Vec<Person>,

Sounds like something I read on the Jane Street blog ;)

> + reviews: Vec<Review>,
> + tries: Vec<Try>,
> +}
> +
> +/// Possibly automated attempt to merge a `Revision` of the `Path` onto
> some
> +/// base commit and then verify the result using the build definition
> of the
> +/// project.
> +struct Try {
> + /// Revision that was attempted.
> + rev: Revision,
> + /// Resulting head commit (possibly a merge commit).
> + head: Oid,
> + /// The merge base used.
> + base: Oid,
> + result: Either<Conflict, BuildResult>,
> + /// May be a real person or a robot.
> + conducted_by: Urn,
> +}
> +
> +/// Merge failed.
> +struct Conflict;
> +
> +enum BuildResult {
> + Pass { output: Url },
> + Fail { output: Url }
> +}
> +
> +struct Person {
> + name: String,
> + email: String,
> + urn: Option<Urn>,
> +}
> +
> +struct Tree<T> {
> + root: T,
> + children: Vec<Tree<T>>,
> +}
> +
> +struct Review {
> + by: Person,
> + verdict: Vote,
> + at: DateTime<Utc>,
> +}
> +
> +enum Vote {
> + Veto,
> + NeedsWork,
> + Lgtm,
> +}
> +
> +struct Comment {
> + author: Urn,
> + content: Text,
> + rel: Rel,
> + time: DateTime<Utc>,
> +}
> +
> +struct Text {
> + content: String,
> + mime: ContentType,
> +}
> +
> +enum ContentType {
> + Plain,
> + Markdown,
> + AsciiDoc,
> +}
> +
> +struct Rel {
> + rev: Revision,
> + loc: Option<SourceLoc>,
> +}
> +
> +struct SourceLoc {
> + blob: Oid,
> + line: Option<Range<usize>>,
> +}
> +----
> +
> +=== Patch Revisions
> +
> +For the purpose of this document, a "patch" is conceptually similar to
> a "topic
> +branch": a series of one or more commits made on top of some commit
> within the
> +ancestry graph of the project's main branch.
> +
> +To submit a patch for consideration, a tag object ("annotated tag") is
> created
> +and anchored at a ref of the form:
> +
> + refs/patches/<cob id>/<sequence number>
> +
> +Where `<cob id>` is the object id of the corresponding `patch`
> collaborative
> +object (cob), and `<sequence number>` is a monotonic counter local to
> the
> +submitting peer id. Note that the tag is _not_ anchored under the
> `refs/tags`
> +hierarchy by default, because this is treated specially by the `git`
> UI.
> +Instead, it shall be possible to publish a patch revision as a regular
> tag _post
> +hoc_.
> +
> +The tag's name (the `tag` header) shall be of the form:
> +
> + patches/<cob id>/<peer id>/<sequence number>
> +
> +This structure should be preserved when publishing the tag to a
> non-`link` tree
> +(ie. `refs/tags/patches/<cob id>/<peer id>/<sequence number>`).
> +
> +The tag's message shall contain metadata which allow it to be
> correlated even if
> +only the tag object is known, for example:
> +
> + Radicle-Cob: <cob id>
> + Radicle-Author: <urn>
> + Radicle-Peer: <peer id>
> + Depends-On: <other cob id>
> + ...
> +
> +The tag _may_ be cryptographically signed, again with the intention to
> make it
> +pubishable externally. The signing key _must_ be subject to some
> external PKI,
> +and _must not_ be the `link` device key (nb. the tag is implicitly
> signed by
> +the device key via the `signed_refs`).

Will we be defining if we're going to be cryptographically signing in
the final RFC?

> +
> +=== UI
> +
> +A `patch` cob may be edited using whatever tooling is available, but it
> can not
> +exist without at least one patch revision. Thus, to create a `patch`
> cob, the
> +submitter must push the commit series to the `link` repo, which shall
> create the
> +corresponding cob implicitly.
> +
> +GitHub users are likely to prefer a familiar workflow, so for them we
> propose a
> +dedicated CLI tool which is parametrised by a local topic and a base
> branch (or
> +guesses them), and creates both the initial cob and corresponding
> revision tag.
> +It may prompt to open a text editor for the cover letter. Once
> committed, the
> +tool shall prompt the user to set the upstream for their local branch
> to
> +
> + refs/patches/<cob id>
> +
> +[NOTE]
> +Alternatively, we may be able to alias <cob id> with the local branch
> name.
> +
> +Subsequent pushes to this upstream ref shall create a new revision tag
> and
> +update the cob accordingly.
> +
> +[NOTE]
> +Users familiar with the Gerrit Code Review system <<gerrit>>, or more
> generally
> +"branchless workflows", will find this procedure rather cumbersome. It
> is the
> +hope of the author to provide a UI similar to Gerrit's magical
> `refs/for` ref,
> +such that almost all interactions can be carried out using `git push`
> (cf.
> +<<gerrit-push>>). Note, however, that this implies to establish a
> stable patch
> +id (`Change-Id`), which is somewhat invasive.

Wouldn't the `<cob id>` be stable? Or is the issue that it won't be
created until the first tag is created -- because the revision is
included in the metadata?

> +
> +=== Mechanical Maintainer
> +
> +The mechanical maintainer (mm) is envisioned as a `bors`-like program,
> which
> +applies patches to their target branch, and verifies the result by
> running some
> +build specification (either locally or by triggering a web service).
> Unlike
> +`bors` proper, `mm` must produce identical `git` histories on
> independent runs.
> +To achieve this, `mm` proceeds as follows:
> +
> +- collect `patch` cobs for the desired `target`
> +- filter out the ones whose most recent `revision` is reachable from
> the tip of
> + `target`
> +- evaluate the review votes, and drop patches which are not deemed to
> be
> + accepted
> +- sort the remaining revisions topologically
> +- drop patches which depend on not-yet-accepted ones
> +- sort the remaining revisions according to the timestamp of the most
> recent
> + approving vote
> +- apply the revisions in this order to the `target` by creating merge
> commits
> + with a fixed author / commiter (eg. "Mechanical Maintainer
> mm@radicle") and
> + the timestamp of the most recent approving vote. The revision tag must
> be
> + included in the commit as a merge-tag header.
> +- if a merge fails, update the corresponding cob, prune any revisions
> which may
> + depend on the failed one, and retry with the resulting list (if not
> empty)
> +- run the build, collect the result, and update the corresponding cobs
> +- if the build passes, publish the new `target`
> +- if the build fails, update the head cob, undo the merge, and retry
> with the
> + remaining history.
> +- optionally, send out side-channel notifications such as email, irc,
> slack
> +
> +=== Further Research
> +
> +- It might be more practical to explicitly "close" a `patch`, recording
> the oid
> + of the merge commit
> +- Is there a more compact format for archiving closed patches? We will
> + accumulate a lot of refs.

Could we represent a closed patch as another cob but it's payload simply
points to the original id? It's another ref but it's lightweight in the
payload.

> +- Depending on the identity the `mm` had assumed, the resulting
> `target` may not
> + be considered a maintainer branch. That is, explicit action is
> required to
> + reset to the `target`. This may or may not be desirable.
> +- We cannot assume the build to be deterministic, so perhaps the `mm`
> should
> + have two phases, one which tries to integrate, and another one which
> examines
> + the build results from multiple independent runs, and merges based on
> a
> + threshold policy.
> +- The reachability computations to order patches are not entirely
> trivial,
> + potentially some of the (dependency) information can be pre-computed
> and
> + stored in the cobs.
> +
> +
> +[bibliography]
> +== References
> +
> +* [[[undo-merge]]]
> https://kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.txt
> +* [[[bors]]] https://github.com/graydon/bors
> +* [[[gerrit]]] https://gerritcodereview.com
> +* [[[gerrit-push]]]
> https://gerrit-review.googlesource.com/Documentation/user-upload.html#_git_push
> --
> 2.33.1

Re: [PATCH] RFC: patches

Details
Message ID
<20211116132042.GD10220@schmidt.localdomain>
In-Reply-To
<CFQHF3QFO02B.391FPPHQL0BKU@haptop> (view parent)
DKIM signature
pass
Download raw message
On Mon, 15 Nov 2021 16:08:44 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> > +=== Collaborative Object
> > +
> > +[source,rust]
> > +----
> > +/// Tag hash
> > +type Revision = Oid;
> > +
> > +struct Patch {
> > + // Head is the most recent revision
> > + revisions: NonEmpty<Revision>,
>
> Is this equivalent to the v1, v2, etc. that we have been using for the
> email-flow?

Yep. Except that the CRDT needs to magically figure out the most recent
revision: if we don't want to restrict this to only a single submitter, the
sequence numbers need to be counted relative to the peer id. That is, if you
submit a patch v1, and I want to propose some amendment to it, then we end up
with:

    YOU/refs/patches/<cob id>/1
    ME/refs/patches/<cob id>/1

but as per the causal history of the CRDT, I'm imagining that ME is considered
more recent. Otherwise, we'd need to embed some kind of mini vector clock into
the sequence number.

> > + target: Refname,
> > + cover_letter: Option<Text>,
> > + comments: Vec<Tree<Comment>>,
> > + // Denotes who is expected to review. We may make this more granular
> > by
> > + // specifying who MUST review and who SHOULD review. Maybe for the
> > + // enterprise edition.
> > + reviewers: Vec<Person>,
>
> Sounds like something I read on the Jane Street blog ;)

Ya, I was more looking at Gerrit respectively mailing list flows: both have the
notion of a CC: to assign reviewers. Maybe some projects would also pick
reviewers randomly, and not be limited to the "maintainers". But this can
obviously be made infinitely complicated.

> > +The tag _may_ be cryptographically signed, again with the intention to
> > make it
> > +pubishable externally. The signing key _must_ be subject to some
> > external PKI,
> > +and _must not_ be the `link` device key (nb. the tag is implicitly
> > signed by
> > +the device key via the `signed_refs`).
>
> Will we be defining if we're going to be cryptographically signing in
> the final RFC?

Maybe it's actually fine to include device key signatures, because the link
identity scheme provides a PKI of sorts (and has the nice property of
"survivable key compromise").

What I would like to make difficult, however, is to re-use GPG or SSH keys as
link device keys, or vice versa. Because those might be anchored in some
external PKI, there is only an estimated .001 fraction of the population which
would be able to reason about a situation in which such a shared key gets
compromised.

> > +[NOTE]
> > +Users familiar with the Gerrit Code Review system <<gerrit>>, or more
> > generally
> > +"branchless workflows", will find this procedure rather cumbersome. It
> > is the
> > +hope of the author to provide a UI similar to Gerrit's magical
> > `refs/for` ref,
> > +such that almost all interactions can be carried out using `git push`
> > (cf.
> > +<<gerrit-push>>). Note, however, that this implies to establish a
> > stable patch
> > +id (`Change-Id`), which is somewhat invasive.
>
> Wouldn't the `<cob id>` be stable? Or is the issue that it won't be
> created until the first tag is created -- because the revision is
> included in the metadata?

Yes, but the id needs to be embedded in the commits you make in your working
copy. Gerrit's magic is that you only ever `git push HEAD:refs/for/master`, and
by looking at the Change-Id it can correlate commits with reviews.

That's very, very nice, but people may need to get over their aesthetic
preferences. Also, they need to install a `commit-msg` hook into their local
repo.

> > +- Is there a more compact format for archiving closed patches? We will
> > + accumulate a lot of refs.
>
> Could we represent a closed patch as another cob but it's payload simply
> points to the original id? It's another ref but it's lightweight in the
> payload.

I'm not sure how you mean that -- would that be something like
`cobs/patches/closed/ID`?

My thinking is that we may not want to keep around 16k refs for historical
purposes. If it would be the case that "closed" patches cannot be reopened later
(but maybe referenced), we could archive their trees into a single archive cob.

Re: [PATCH] RFC: patches

Details
Message ID
<CFR94JXZS297.25LTYL60NSGLI@haptop>
In-Reply-To
<20211116132042.GD10220@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Tue Nov 16, 2021 at 12:20 PM GMT, Kim Altintop wrote:
> On Mon, 15 Nov 2021 16:08:44 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > > +=== Collaborative Object
> > > +
> > > +[source,rust]
> > > +----
> > > +/// Tag hash
> > > +type Revision = Oid;
> > > +
> > > +struct Patch {
> > > + // Head is the most recent revision
> > > + revisions: NonEmpty<Revision>,
> >
> > Is this equivalent to the v1, v2, etc. that we have been using for the
> > email-flow?
>
> Yep. Except that the CRDT needs to magically figure out the most recent
> revision: if we don't want to restrict this to only a single submitter,
> the
> sequence numbers need to be counted relative to the peer id. That is, if
> you
> submit a patch v1, and I want to propose some amendment to it, then we
> end up
> with:
>
> YOU/refs/patches/<cob id>/1
> ME/refs/patches/<cob id>/1
>
> but as per the causal history of the CRDT, I'm imagining that ME is
> considered
> more recent. Otherwise, we'd need to embed some kind of mini vector
> clock into
> the sequence number.

Why would ME not increment the versioning to 2 in this case? At least
then it can be logically tracked. However, this doesn't preclude the
case where I might not have seen your re-roll and I increment to 2. We
end up back in the same boat. LWW is where my thoughts initially jump
to, but that would need a vector clock as you've also mentioned.

>
> > > + target: Refname,
> > > + cover_letter: Option<Text>,
> > > + comments: Vec<Tree<Comment>>,
> > > + // Denotes who is expected to review. We may make this more granular
> > > by
> > > + // specifying who MUST review and who SHOULD review. Maybe for the
> > > + // enterprise edition.
> > > + reviewers: Vec<Person>,
> >
> > Sounds like something I read on the Jane Street blog ;)
>
> Ya, I was more looking at Gerrit respectively mailing list flows: both
> have the
> notion of a CC: to assign reviewers. Maybe some projects would also pick
> reviewers randomly, and not be limited to the "maintainers". But this
> can
> obviously be made infinitely complicated.

Ah I see :) I like the idea to be honest. It's a shame GitHub groups all
reviewers together, when really I tagged other people just because I
thought they might be vaguely interested in the changes.

> > > +The tag _may_ be cryptographically signed, again with the intention to
> > > make it
> > > +pubishable externally. The signing key _must_ be subject to some
> > > external PKI,
> > > +and _must not_ be the `link` device key (nb. the tag is implicitly
> > > signed by
> > > +the device key via the `signed_refs`).
> >
> > Will we be defining if we're going to be cryptographically signing in
> > the final RFC?
>
> Maybe it's actually fine to include device key signatures, because the
> link
> identity scheme provides a PKI of sorts (and has the nice property of
> "survivable key compromise").
>
> What I would like to make difficult, however, is to re-use GPG or SSH
> keys as
> link device keys, or vice versa. Because those might be anchored in some
> external PKI, there is only an estimated .001 fraction of the population
> which
> would be able to reason about a situation in which such a shared key
> gets
> compromised.
>

"or vice versa" -- so we shouldn't use the link device key for the git
ssh-signing? Or am I misunderstanding that?

> > > +[NOTE]
> > > +Users familiar with the Gerrit Code Review system <<gerrit>>, or more
> > > generally
> > > +"branchless workflows", will find this procedure rather cumbersome. It
> > > is the
> > > +hope of the author to provide a UI similar to Gerrit's magical
> > > `refs/for` ref,
> > > +such that almost all interactions can be carried out using `git push`
> > > (cf.
> > > +<<gerrit-push>>). Note, however, that this implies to establish a
> > > stable patch
> > > +id (`Change-Id`), which is somewhat invasive.
> >
> > Wouldn't the `<cob id>` be stable? Or is the issue that it won't be
> > created until the first tag is created -- because the revision is
> > included in the metadata?
>
> Yes, but the id needs to be embedded in the commits you make in your
> working
> copy. Gerrit's magic is that you only ever `git push
> HEAD:refs/for/master`, and
> by looking at the Change-Id it can correlate commits with reviews.
>
> That's very, very nice, but people may need to get over their aesthetic
> preferences. Also, they need to install a `commit-msg` hook into their
> local
> repo.

Could we consider our own tooling here? Something like `git rad-patch`
or whatever. Or even make it an extension of the `rad` exe, so `rad
patch`.

> > > +- Is there a more compact format for archiving closed patches? We will
> > > + accumulate a lot of refs.
> >
> > Could we represent a closed patch as another cob but it's payload simply
> > points to the original id? It's another ref but it's lightweight in the
> > payload.
>
> I'm not sure how you mean that -- would that be something like
> `cobs/patches/closed/ID`?
>
> My thinking is that we may not want to keep around 16k refs for
> historical
> purposes. If it would be the case that "closed" patches cannot be
> reopened later
> (but maybe referenced), we could archive their trees into a single
> archive cob.

Hmmm ya, I like that archive idea! And is there a reason we would avoid
having a flag in the patch cob itself?

Re: [PATCH] RFC: patches

Details
Message ID
<20211116154533.GH10220@schmidt.localdomain>
In-Reply-To
<CFR94JXZS297.25LTYL60NSGLI@haptop> (view parent)
DKIM signature
pass
Download raw message
On Tue, 16 Nov 2021 13:51:28 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> On Tue Nov 16, 2021 at 12:20 PM GMT, Kim Altintop wrote:
> > On Mon, 15 Nov 2021 16:08:44 +0000 "Fintan Halpenny"
> > <fintan.halpenny@gmail.com> wrote:
> > > > +=== Collaborative Object
> > > > +
> > > > +[source,rust]
> > > > +----
> > > > +/// Tag hash
> > > > +type Revision = Oid;
> > > > +
> > > > +struct Patch {
> > > > + // Head is the most recent revision
> > > > + revisions: NonEmpty<Revision>,
> > >
> > > Is this equivalent to the v1, v2, etc. that we have been using for the
> > > email-flow?
> >
> > Yep. Except that the CRDT needs to magically figure out the most recent
> > revision: if we don't want to restrict this to only a single submitter,
> > the
> > sequence numbers need to be counted relative to the peer id. That is, if
> > you
> > submit a patch v1, and I want to propose some amendment to it, then we
> > end up
> > with:
> >
> > YOU/refs/patches/<cob id>/1
> > ME/refs/patches/<cob id>/1
> >
> > but as per the causal history of the CRDT, I'm imagining that ME is
> > considered
> > more recent. Otherwise, we'd need to embed some kind of mini vector
> > clock into
> > the sequence number.
>
> Why would ME not increment the versioning to 2 in this case? At least
> then it can be logically tracked. However, this doesn't preclude the
> case where I might not have seen your re-roll and I increment to 2. We
> end up back in the same boat. LWW is where my thoughts initially jump
> to, but that would need a vector clock as you've also mentioned.

Yeah right, there could be arbitrarily many concurrent rerolls of different
people. But since they would also update the cob, the CRDT would either pick
one, or require a human to say what the most recent one is. I'll leave that to
Alex to teach us how it works :)

> > > > +The tag _may_ be cryptographically signed, again with the intention to
> > > > make it
> > > > +pubishable externally. The signing key _must_ be subject to some
> > > > external PKI,
> > > > +and _must not_ be the `link` device key (nb. the tag is implicitly
> > > > signed by
> > > > +the device key via the `signed_refs`).
> > >
> > > Will we be defining if we're going to be cryptographically signing in
> > > the final RFC?
> >
> > Maybe it's actually fine to include device key signatures, because the
> > link
> > identity scheme provides a PKI of sorts (and has the nice property of
> > "survivable key compromise").
> >
> > What I would like to make difficult, however, is to re-use GPG or SSH
> > keys as
> > link device keys, or vice versa. Because those might be anchored in some
> > external PKI, there is only an estimated .001 fraction of the population
> > which
> > would be able to reason about a situation in which such a shared key
> > gets
> > compromised.
> >
>
> "or vice versa" -- so we shouldn't use the link device key for the git
> ssh-signing? Or am I misunderstanding that?

The Mythical Real Cryptographer might have reservations about key reuse in
general, that is using the device key for transport layer security as well as
identity signing as well as signing arbitrary commits.

Let's assume those three things are fine, because it is easy to rotate a device
key. What I would like to avoid though is that people stick their device key
into their GPG keyring and upload to keybase or ethereum or whatever. Or use the
SSH key using which they can gain root access to their company's kubernetes
cluster as a device key.

> > > > +[NOTE]
> > > > +Users familiar with the Gerrit Code Review system <<gerrit>>, or more
> > > > generally
> > > > +"branchless workflows", will find this procedure rather cumbersome. It
> > > > is the
> > > > +hope of the author to provide a UI similar to Gerrit's magical
> > > > `refs/for` ref,
> > > > +such that almost all interactions can be carried out using `git push`
> > > > (cf.
> > > > +<<gerrit-push>>). Note, however, that this implies to establish a
> > > > stable patch
> > > > +id (`Change-Id`), which is somewhat invasive.
> > >
> > > Wouldn't the `<cob id>` be stable? Or is the issue that it won't be
> > > created until the first tag is created -- because the revision is
> > > included in the metadata?
> >
> > Yes, but the id needs to be embedded in the commits you make in your
> > working
> > copy. Gerrit's magic is that you only ever `git push
> > HEAD:refs/for/master`, and
> > by looking at the Change-Id it can correlate commits with reviews.
> >
> > That's very, very nice, but people may need to get over their aesthetic
> > preferences. Also, they need to install a `commit-msg` hook into their
> > local
> > repo.
>
> Could we consider our own tooling here? Something like `git rad-patch`
> or whatever. Or even make it an extension of the `rad` exe, so `rad
> patch`.

Sure yes. I just expect to violate peoples aesthetic sensibility wrt how a git
commit history must look like. What's that Change-Id crap? Oh what merge
commits?! We use trunks based devlopers over here!

> > > > +- Is there a more compact format for archiving closed patches? We will
> > > > + accumulate a lot of refs.
> > >
> > > Could we represent a closed patch as another cob but it's payload simply
> > > points to the original id? It's another ref but it's lightweight in the
> > > payload.
> >
> > I'm not sure how you mean that -- would that be something like
> > `cobs/patches/closed/ID`?
> >
> > My thinking is that we may not want to keep around 16k refs for
> > historical
> > purposes. If it would be the case that "closed" patches cannot be
> > reopened later
> > (but maybe referenced), we could archive their trees into a single
> > archive cob.
>
> Hmmm ya, I like that archive idea! And is there a reason we would avoid
> having a flag in the patch cob itself?

To not have the refs grow unboundedly. It's probably different with "Issues",
because one might want to reopen them.

Re: [PATCH] RFC: patches

Details
Message ID
<CFSRON026I4B.3CPM2FYXTN72N@haptop>
In-Reply-To
<20211116154533.GH10220@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Tue Nov 16, 2021 at 2:45 PM GMT, Kim Altintop wrote:
> On Tue, 16 Nov 2021 13:51:28 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > On Tue Nov 16, 2021 at 12:20 PM GMT, Kim Altintop wrote:
> > > On Mon, 15 Nov 2021 16:08:44 +0000 "Fintan Halpenny"
> > > <fintan.halpenny@gmail.com> wrote:
> > > > > +The tag _may_ be cryptographically signed, again with the intention to
> > > > > make it
> > > > > +pubishable externally. The signing key _must_ be subject to some
> > > > > external PKI,
> > > > > +and _must not_ be the `link` device key (nb. the tag is implicitly
> > > > > signed by
> > > > > +the device key via the `signed_refs`).
> > > >
> > > > Will we be defining if we're going to be cryptographically signing in
> > > > the final RFC?
> > >
> > > Maybe it's actually fine to include device key signatures, because the
> > > link
> > > identity scheme provides a PKI of sorts (and has the nice property of
> > > "survivable key compromise").
> > >
> > > What I would like to make difficult, however, is to re-use GPG or SSH
> > > keys as
> > > link device keys, or vice versa. Because those might be anchored in some
> > > external PKI, there is only an estimated .001 fraction of the population
> > > which
> > > would be able to reason about a situation in which such a shared key
> > > gets
> > > compromised.
> > >
> >
> > "or vice versa" -- so we shouldn't use the link device key for the git
> > ssh-signing? Or am I misunderstanding that?
>
> The Mythical Real Cryptographer might have reservations about key reuse
> in
> general, that is using the device key for transport layer security as
> well as
> identity signing as well as signing arbitrary commits.
>
> Let's assume those three things are fine, because it is easy to rotate a
> device
> key. What I would like to avoid though is that people stick their device
> key
> into their GPG keyring and upload to keybase or ethereum or whatever. Or
> use the
> SSH key using which they can gain root access to their company's
> kubernetes
> cluster as a device key.

Right, I think what the application teams are doing is associating other
keys inside the Person. I don't think there's been any thought about
exporting device keys outside of the Radicle domain.

> > > > > +[NOTE]
> > > > > +Users familiar with the Gerrit Code Review system <<gerrit>>, or more
> > > > > generally
> > > > > +"branchless workflows", will find this procedure rather cumbersome. It
> > > > > is the
> > > > > +hope of the author to provide a UI similar to Gerrit's magical
> > > > > `refs/for` ref,
> > > > > +such that almost all interactions can be carried out using `git push`
> > > > > (cf.
> > > > > +<<gerrit-push>>). Note, however, that this implies to establish a
> > > > > stable patch
> > > > > +id (`Change-Id`), which is somewhat invasive.
> > > >
> > > > Wouldn't the `<cob id>` be stable? Or is the issue that it won't be
> > > > created until the first tag is created -- because the revision is
> > > > included in the metadata?
> > >
> > > Yes, but the id needs to be embedded in the commits you make in your
> > > working
> > > copy. Gerrit's magic is that you only ever `git push
> > > HEAD:refs/for/master`, and
> > > by looking at the Change-Id it can correlate commits with reviews.
> > >
> > > That's very, very nice, but people may need to get over their aesthetic
> > > preferences. Also, they need to install a `commit-msg` hook into their
> > > local
> > > repo.
> >
> > Could we consider our own tooling here? Something like `git rad-patch`
> > or whatever. Or even make it an extension of the `rad` exe, so `rad
> > patch`.
>
> Sure yes. I just expect to violate peoples aesthetic sensibility wrt how
> a git
> commit history must look like. What's that Change-Id crap? Oh what merge
> commits?! We use trunks based devlopers over here!

We just need to teach them to set fire to themselves.

> > > > > +- Is there a more compact format for archiving closed patches? We will
> > > > > + accumulate a lot of refs.
> > > >
> > > > Could we represent a closed patch as another cob but it's payload simply
> > > > points to the original id? It's another ref but it's lightweight in the
> > > > payload.
> > >
> > > I'm not sure how you mean that -- would that be something like
> > > `cobs/patches/closed/ID`?
> > >
> > > My thinking is that we may not want to keep around 16k refs for
> > > historical
> > > purposes. If it would be the case that "closed" patches cannot be
> > > reopened later
> > > (but maybe referenced), we could archive their trees into a single
> > > archive cob.
> >
> > Hmmm ya, I like that archive idea! And is there a reason we would avoid
> > having a flag in the patch cob itself?
>
> To not have the refs grow unboundedly. It's probably different with
> "Issues",
> because one might want to reopen them.

Couldn't the flag prohibit (modulo concurrenct edits) any additions to
the patch if it's marked as closed? Basically a max bound gate.

Re: [PATCH] RFC: patches

Details
Message ID
<20211118100348.GD30665@schmidt.localdomain>
In-Reply-To
<CFSRON026I4B.3CPM2FYXTN72N@haptop> (view parent)
DKIM signature
pass
Download raw message
On Thu, 18 Nov 2021 08:36:42 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> > To not have the refs grow unboundedly. It's probably different with
> > "Issues",
> > because one might want to reopen them.
>
> Couldn't the flag prohibit (modulo concurrenct edits) any additions to
> the patch if it's marked as closed? Basically a max bound gate.

Sure yes, but 1000 patches leave 1000 cob refs * tracked peers * reroll tags.

Re: [PATCH] RFC: patches

Details
Message ID
<CFSSB5LO5GQK.3SCYJ1LWO7LJY@haptop>
In-Reply-To
<20211118100348.GD30665@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Thu Nov 18, 2021 at 9:03 AM GMT, Kim Altintop wrote:
> On Thu, 18 Nov 2021 08:36:42 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > > To not have the refs grow unboundedly. It's probably different with
> > > "Issues",
> > > because one might want to reopen them.
> >
> > Couldn't the flag prohibit (modulo concurrenct edits) any additions to
> > the patch if it's marked as closed? Basically a max bound gate.
>
> Sure yes, but 1000 patches leave 1000 cob refs * tracked peers * reroll
> tags.

I think I've gotten lost on the way here. Is that not an issue without
even considering "closing" patches? Or are you getting at the fact that
we'd like GC closed patches and archive them away to get back storage
space?

Re: [PATCH] RFC: patches

Details
Message ID
<20211118101500.GH30665@schmidt.localdomain>
In-Reply-To
<CFSSB5LO5GQK.3SCYJ1LWO7LJY@haptop> (view parent)
DKIM signature
pass
Download raw message
On Thu, 18 Nov 2021 09:06:06 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> Or are you getting at the fact that we'd like GC closed patches and archive
> them away to get back storage space?

Yes exactly
Reply to thread Export thread (mbox)