~radicle-link/dev

8 4

Core protocol scope

Details
Message ID
<rTbVAfR8mHmzNYdSHl4r2Hl9hh5EgCv_9SuHoSiMS4HhHJiWSB1v0noahMMhjTKUuvIO82sFrWZLVnXDjJhSAWMLC7-LpJFqjc3fQEnYje0=@radicle.foundation>
DKIM signature
pass
Download raw message
Hey there,

As discussed on the phone earlier today, I think it would be a good idea to have a little discussion around protocol scope, to figure out what features should be expected to be part of the protocol vs. built by application developers. This would encourage developers to contribute in the former case and not duplicate work.

Through developing the org node, we've made several additions to the core functionality. Some of these are identical to what we had in the old seed node, some are new. For each feature I'll explain whether I think it would make sense to have the functionality present in the protocol proper. Let's lay it out here:

# Org node functionality currently implemented

* Keeping an up to date local HEAD and /heads/master for each project

This is what allows `git clone` to work correctly, as otherwised it results in a "detached head" state. I don't see a huge benefit in having this functionality available in the protocol.

* Auto-tracking peers

This allows one to specify a list of trusted peers which are always tracked. We do this by listening for "Uninteresting" Put messages. I think the current way we do it is not too bad, though I can also imagine this being part of the core protocol functionality. I lean towards *not* having this in the core protocol if and only if the current method of listening for "uninteresting" gossip messages is meant to be used this way.

* Fetch retrying

When a project is tracked, we attempt to fetch it. If this fails, we keep trying to fetch it every minute or so. This retry behavior I feel would benefit from being part of the protocol, if it isn't already. Basically, it would be nice that once a project+peer is tracked, radicle-link keeps trying to fetch it until it succeeds. Application developers then only need to create a tracking relationship and that's it.

* Finding a peer for a project

The first time we attempt to track a project, a suitable peer needs to be found. We do this by calling `providers()` and iterating over the peers until one of them succeeds. This is necessary because you can't just "track a project", you need at least one peer id to establish a tracking relationship (right?). This is ok, though not ideal. Ideal would be that we can keep some state in storage that says "intent to track project X", which is really the intent to track the maintainer of that project. I'm not sure if this is work adding to the protocol though, since the protocol doesn't really track projects per se.

--

# Features NOT part of the org-node which we are thinking of adding

* Periodic announcement

If nodes are not online at the same time, it's easy for a node to miss an announcement of a new ref. An easy, though noisy way to mitigate this is to periodically announce your refs, so as to trigger a fetch from any interested peer. The ability to stay in sync while missing the first gossip announcement should really be part of the protocol. Either by announcements, or some other mechanism eg. by pulling periodically, or having some kind of DHT to check the "latest state".

* Storage ref diff

If new refs are pushed while a node is down, it's important that when the node comes back up, it announces these new refs so that peers can stay in sync. Upstream does this by doing a ref diff on startup. This is an acceptable solution, but this would benefit from being handled in the protocol. I'm not saying the protocol should do a diff on startup, but rather that it should handle the case of refs being changed while it's offline in a graceful manner.

* Local state verification

When things go wrong, it's sometimes hard to know whether it's because your local state is corrupt/incomplete, or something is wrong with the peer you are talking to. One thing that would be useful for applications is to check the storage on startup and verify that everything is in order, eg. that refs are signed, and the state is internally consistent, eg. the signed refs match the refs pointed to by the branches, etc. This is something that can be implemented by app developers, but again I think it would benefit from being part of the protocol, since it depends on low level implementation details.

--

That's about it for now, at least these are the important things in the short term. I think some of the above belongs in the protocol, but not all. Perhaps there are also ways to make the above a non-issue, as in the end, a lot of this revolves around *being in sync and keeping your peers in sync*, which has a lot of different approaches.

I'm curious to hear what you all think of this, and what could potentially belong in the protocol. I can imagine us allocating some time towards solving these problems directly in the protocol if it aligns with the protocol roadmap.

Thanks
Details
Message ID
<20211111093451.GB7438@schmidt.localdomain>
In-Reply-To
<rTbVAfR8mHmzNYdSHl4r2Hl9hh5EgCv_9SuHoSiMS4HhHJiWSB1v0noahMMhjTKUuvIO82sFrWZLVnXDjJhSAWMLC7-LpJFqjc3fQEnYje0=@radicle.foundation> (view parent)
DKIM signature
pass
Download raw message
Thanks Alexis.

I'll reorder your headlines as they apply to pending protocol features.

> * Keeping an up to date local HEAD and /heads/master for each project

I believe some tooling Fintan wrote would actually set HEAD. As it is completely
arbitrary what HEAD points to -- and technically not even delegate trees need to
agree -- I think it is preferable to let clients mutate it however they like. I
would even say that `default_branch` could be removed from the identity payload:
if you want to reach cryptographically verifiable consensus on what the default
branch is, then well, use a consensus system.

> * Local state verification

I have such a thing in the repl3 WIP topic[0], although it is somewhat
tentantive. It also just accumulates suspicious things to the left of a
`Result`, so it is up to the caller to decide what the severity is.

> * Finding a peer for a project
> * Auto-tracking peers

The idea of tracking a project without knowing its delegations is being worked
on, and likely to land soon [1], [2]. With that, "passive" replication should
just work without any event listenting: the URN is marked as interesting, so
ingress gossip pertaining to it would trigger fetches.

Actively finding a "provider" peer on the network is not yet being worked on. It
was, however, already handwaved in the original spec draft. I wouldn't expect a
better design to emerge, but some details will need to be fleshed out. The idea
is two-fold:

1. Peers could designate stable network addresses as being known mirrors of
   their tree. These could be link nodes or regular git hosts.

   The open questions here are:

   - where to put this data?

     It clearly doesn't need to be in the identity document (because it can have
     peer scope, so doesn't require "consensus").

   - what do we expect of the structure of the mirror repo?

     In theory, the entire namespace could be mirrored to somewhere else, but it
     may also be that some platforms impose restrictions on the refs namespaces
     they allow. On the other hand, any repo which doesn't comply to the link
     layout could still be useful for populating the object database.

2. Instead of flooding the network with want messages, peers should maintain a
   "provider cache" even for URNs they are not directly interested in.

   IPFS does a similar thing, which amounts to proactively populating the
   Kademlia k-buckets (which in our case would be based on gossip). I think the
   datastructure is basically the right one, so if you want to help out, a nice
   k-buckets implementation would be useful.

   The open question is how to go about the liveness checking -- Kademlia
   assumes a PING is rather cheap, but that doesn't work when you need to
   perform a crypto handshake before. It might be that we can afford to just
   keep a lot of QUIC connections open (for which we send keepalive frames
   anyway), but someone should go and collect some data on this.

> * Fetch retrying
> * Periodic announcement
> * Storage ref diff

These are awful workarounds, and should really be removed entirely.

Instead, what should happen is what I refer to as "grafting" (because it is
conceptually similar to GRAFT from the epidemic broadcast trees paper): whenever
a peer suspects that it has missed gossip, it should eagerly attempt to "sync".

As it turns out, there are some delicate concurrency considerations, so the
current proposal is to simply initiate it explicitly by a. fetching, and b.
sending a "request pull" RPC[3]. When it is not known whether the remote end
actually has the URN we're trying to sync, there is already an API[4] to issue
an RPC[5] which returns a bloomfilter-like datastructure of the URNs the other
end has. Unless we're trying to sync a very large number of URNs, this can be
used to efficiently trim down the number of RTTs.

This stuff is not particularly difficult to implement and wire up (modulo the
awkwardness of message passing due to async), so patches welcome.

Of course, the remaining question is when do we sync, and with whom? There are
different occasions, some of which are easier to implement than others:

- during startup, with the bootstrap nodes
- during startup, but additionally with the top-k relevant providers from the
  provider cache (which have been persisted to disk)
- when gossip is received for a "peerlessly tracked" URN, with the sender
- during a period of time, with every newly connected peer


HTH


[0]: https://git.sr.ht/~kim/radicle-link/tree/patches/repl3/wip2/item/link-replication/src/validation.rs
[1]: https://lists.sr.ht/~radicle-link/dev/%3Cc2aae5c561e6c1d3b75394de8d570209%40xla.is%3E
[2]: https://lists.sr.ht/~radicle-link/dev/%3C20211028121845.13983-1-fintan.halpenny%40gmail.com%3E
[3]: https://lists.sr.ht/~radicle-link/dev/%3C20210930112525.110401-1-kim%40eagain.st%3E#%3C20211004121701.GB13332@schmidt.localdomain%3E
[4]: https://github.com/radicle-dev/radicle-link/blob/master/librad/src/net/peer.rs#L233
[5]: https://github.com/radicle-dev/radicle-link/blob/master/librad/src/net/protocol/tincans.rs#L191
Details
Message ID
<CFMU9IPEAKZC.3LOQGA78JUR1R@haptop>
In-Reply-To
<20211111093451.GB7438@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Thu Nov 11, 2021 at 8:34 AM GMT, Kim Altintop wrote:
> Thanks Alexis.
>
> I'll reorder your headlines as they apply to pending protocol features.
>
> > * Keeping an up to date local HEAD and /heads/master for each project
>
> I believe some tooling Fintan wrote would actually set HEAD. As it is
> completely
> arbitrary what HEAD points to -- and technically not even delegate trees
> need to
> agree -- I think it is preferable to let clients mutate it however they
> like. I
> would even say that `default_branch` could be removed from the identity
> payload:
> if you want to reach cryptographically verifiable consensus on what the
> default
> branch is, then well, use a consensus system.

I believe that was only for the working copy setup. Are you keeping
these refs up to date in the monorepo, Alexis?

> > * Finding a peer for a project
> > * Auto-tracking peers
>
> The idea of tracking a project without knowing its delegations is being
> worked
> on, and likely to land soon [1], [2]. With that, "passive" replication
> should
> just work without any event listenting: the URN is marked as
> interesting, so
> ingress gossip pertaining to it would trigger fetches.

I'm curious to know if you were aware of this RFC, Alexis? And if not,
could we do something better to promote these somewhere to help with
awareness?
Thomas Scholtes <thomas@axiom.fm>
Details
Message ID
<024d07eb870f7be03eaaf7b118699d6b49285108.camel@axiom.fm>
In-Reply-To
<20211111093451.GB7438@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
Alexis points match what we are dealing with on Upstream’s side. Kim’s outline
would seem to address the issues from our perspective. In particular, I’m happy
to see that tracking will make it easier to initially find a project and
eliminate a large chunk of the workarounds.

The biggest priority for us is grafting to make replication more reliable. 

On Thu, 2021-11-11 at 09:34 +0100, Kim Altintop wrote:
> Of course, the remaining question is when do we sync, and with whom? There are
> different occasions, some of which are easier to implement than others:
> 
> - during startup, with the bootstrap nodes
> - during startup, but additionally with the top-k relevant providers from the
>   provider cache (which have been persisted to disk)
> - when gossip is received for a "peerlessly tracked" URN, with the sender
> - during a period of time, with every newly connected peer
> 

I’m experimenting with these and additional approaches on top of librad. We’ll
hopefully discover what improves the reliability the most and this can inform
grafting.

I will also give the URN interrogation API a spin and see if we can successfully
integrate it.
Details
Message ID
<kmwx-VGdJCUB1LNrIZxTrDLPu8gMqJtwld2MZGwRH-Bizp_QfrUCiFQZ9zm-7E4ArjADZMLCuXeVuZJV_sDgmNjP3du0Ze-luH0-BQ2XoPg=@radicle.foundation>
In-Reply-To
<20211111093451.GB7438@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
Thanks for clarifying, Kim!

> use a consensus system.

Yeah, actually we've been thinking about this, and it's fairly easy to do if you just look
at what has been pushed to the org node.

> > -   Local state verification
>
> I have such a thing in the repl3 WIP topic[0], although it is somewhat
>

That's great!

> > -   Finding a peer for a project
> > -   Auto-tracking peers
>
> The idea of tracking a project without knowing its delegations is being worked
>
> on, and likely to land soon [1], [2]. With that, "passive" replication should
>
> just work without any event listenting: the URN is marked as interesting, so
>
> ingress gossip pertaining to it would trigger fetches.

Super. I'm somewhat surprised about this being worked on already, as I thought it
might not pair well with the current tracking design, but good to hear!

>
> Actively finding a "provider" peer on the network is not yet being worked on. It
>
> was, however, already handwaved in the original spec draft. I wouldn't expect a
>
> better design to emerge, but some details will need to be fleshed out. The idea
>
> is two-fold:
> ...

This is interesting, and has quite a big of surface area. I do think some kind
of DHT would help here, but I'm not sure how important this feature is going
to be, given the other "discovery" mechanism at the application layer that we
are building. Let's see.

>
> > -   Fetch retrying
> > -   Periodic announcement
> > -   Storage ref diff
>
> These are awful workarounds, and should really be removed entirely.
>
> Instead, what should happen is what I refer to as "grafting" (because it is
>
> conceptually similar to GRAFT from the epidemic broadcast trees paper): whenever
>
> a peer suspects that it has missed gossip, it should eagerly attempt to "sync".
>
> As it turns out, there are some delicate concurrency considerations, so the
>
> current proposal is to simply initiate it explicitly by a. fetching, and b.
>
> sending a "request pull" RPC[3]. When it is not known whether the remote end
>
> actually has the URN we're trying to sync, there is already an API[4] to issue
>
> an RPC[5] which returns a bloomfilter-like datastructure of the URNs the other
>
> end has. Unless we're trying to sync a very large number of URNs, this can be
>
> used to efficiently trim down the number of RTTs.

Interesting. It seems like this is the responsibility of the application, though?

Side question: does the bloom filter protect the privacy of the inventory of the sender of the
bloom filter? Seems like a nice benefit of using a bloom filter, besides the efficiency.

>
> This stuff is not particularly difficult to implement and wire up (modulo the
>

Ah, so by "wire up" you mean it would actually be part of the protocol? The reason
this is important I think is that the idea that a peer "may have missed" a gossip
seems hard to know for the application.

>
> Of course, the remaining question is when do we sync, and with whom? There are
>
> different occasions, some of which are easier to implement than others:
>
> -   during startup, with the bootstrap nodes

Yeah, startup with bootstrap nodes seems like a no-brainer.

>     provider cache (which have been persisted to disk)
> -   when gossip is received for a "peerlessly tracked" URN, with the sender
> -   during a period of time, with every newly connected peer
>

With newly connected peers also makes sense to me.

I realize I have also forgotten to mention something, which I don't see covered here, at least
not in a satisfying way: When the local user issues a file-system push via `git push rad`, the
node may or may not be running. In the case where it isn't running, it seems like the announcement
could happen when the node connects to other peers, via this grafting mechanism. But in the case
where the node is online, I'd expect a mechanism to "detect" the new ref, and announce only that
ref to all connected peers. Is there a plan for how this might work?

Thanks
Details
Message ID
<Tst4BsOAGft11_YVEe3tkilP1L-Rt3-E-pH9EZxU1Wqu0L76QG4bVQ5ESjuiNKcyf3uqgg7j5GdqbMD7lD9bqmgDw5jY-7PDtE7Q5Sbb4aM=@radicle.foundation>
In-Reply-To
<CFMU9IPEAKZC.3LOQGA78JUR1R@haptop> (view parent)
DKIM signature
pass
Download raw message
I wasn't aware of this RFC! I've noticed you have an `announce` list now, which I've
subscribed to. I think mentioning new RFCs in those emails might do the trick,
as well as when they get merged. Same as in the rust weekly newsletter[0].

[0]: https://this-week-in-rust.org/blog/2021/11/10/this-week-in-rust-416/

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, November 11th, 2021 at 10:22, Fintan Halpenny <fintan.halpenny@gmail.com> wrote:

> On Thu Nov 11, 2021 at 8:34 AM GMT, Kim Altintop wrote:
>
> > Thanks Alexis.
> >
> > I'll reorder your headlines as they apply to pending protocol features.
> >
> > > -   Keeping an up to date local HEAD and /heads/master for each project
> >
> > I believe some tooling Fintan wrote would actually set HEAD. As it is
> >
> > completely
> >
> > arbitrary what HEAD points to -- and technically not even delegate trees
> >
> > need to
> >
> > agree -- I think it is preferable to let clients mutate it however they
> >
> > like. I
> >
> > would even say that `default_branch` could be removed from the identity
> >
> > payload:
> >
> > if you want to reach cryptographically verifiable consensus on what the
> >
> > default
> >
> > branch is, then well, use a consensus system.
>
> I believe that was only for the working copy setup. Are you keeping
>
> these refs up to date in the monorepo, Alexis?
>
> > > -   Finding a peer for a project
> > > -   Auto-tracking peers
> >
> > The idea of tracking a project without knowing its delegations is being
> >
> > worked
> >
> > on, and likely to land soon [1], [2]. With that, "passive" replication
> >
> > should
> >
> > just work without any event listenting: the URN is marked as
> >
> > interesting, so
> >
> > ingress gossip pertaining to it would trigger fetches.
>
> I'm curious to know if you were aware of this RFC, Alexis? And if not,
>
> could we do something better to promote these somewhere to help with
>
> awareness?
Details
Message ID
<CFQBZ4VP56SM.1MB1E4H23LCVW@haptop>
In-Reply-To
<Tst4BsOAGft11_YVEe3tkilP1L-Rt3-E-pH9EZxU1Wqu0L76QG4bVQ5ESjuiNKcyf3uqgg7j5GdqbMD7lD9bqmgDw5jY-7PDtE7Q5Sbb4aM=@radicle.foundation> (view parent)
DKIM signature
pass
Download raw message
On Mon Nov 15, 2021 at 11:46 AM GMT, Alexis Sellier wrote:
> I wasn't aware of this RFC! I've noticed you have an `announce` list
> now, which I've
> subscribed to. I think mentioning new RFCs in those emails might do the
> trick,
> as well as when they get merged. Same as in the rust weekly
> newsletter[0].
>
> [0]:
> https://this-week-in-rust.org/blog/2021/11/10/this-week-in-rust-416/

Yup, that makes sense to me! It might be nice to x-post those to
radicle.community as well :)
Details
Message ID
<20211115141136.GC96379@schmidt.localdomain>
In-Reply-To
<kmwx-VGdJCUB1LNrIZxTrDLPu8gMqJtwld2MZGwRH-Bizp_QfrUCiFQZ9zm-7E4ArjADZMLCuXeVuZJV_sDgmNjP3du0Ze-luH0-BQ2XoPg=@radicle.foundation> (view parent)
DKIM signature
pass
Download raw message
On Mon, 15 Nov 2021 11:43:01 +0000 Alexis Sellier <alexis@radicle.foundation> wrote:
> >
> > Actively finding a "provider" peer on the network is not yet being worked on. It
> >
> > was, however, already handwaved in the original spec draft. I wouldn't expect a
> >
> > better design to emerge, but some details will need to be fleshed out. The idea
> >
> > is two-fold:
> > ...
>
> This is interesting, and has quite a big of surface area. I do think some kind
> of DHT would help here, but I'm not sure how important this feature is going
> to be, given the other "discovery" mechanism at the application layer that we
> are building. Let's see.

A proper DHT does not make sense for link, because peers replicate based on
interest -- there is no "bitswap" economy, so there is also no way to hash a
piece of data to peers which are supposed to offer a replica of it. Such a thing
is complementary to link, and the best way to take advantage of it today is to
publish the `main` tree (or whatever) on IPFS.

The providers feature as described is rather essential to link, however, because
a. it would tend to terminate the random walk early (and so reduce gossip
overhead), and b. it favours uptime (just like Kademlia). In the presence of
NATs, gossip may not actually get through, so we need to favour routes which
improve the probability.

> > > -   Fetch retrying
> > > -   Periodic announcement
> > > -   Storage ref diff
> >
> > These are awful workarounds, and should really be removed entirely.
> >
> > Instead, what should happen is what I refer to as "grafting" (because it is
> >
> > conceptually similar to GRAFT from the epidemic broadcast trees paper): whenever
> >
> > a peer suspects that it has missed gossip, it should eagerly attempt to "sync".
> >
> > As it turns out, there are some delicate concurrency considerations, so the
> >
> > current proposal is to simply initiate it explicitly by a. fetching, and b.
> >
> > sending a "request pull" RPC[3]. When it is not known whether the remote end
> >
> > actually has the URN we're trying to sync, there is already an API[4] to issue
> >
> > an RPC[5] which returns a bloomfilter-like datastructure of the URNs the other
> >
> > end has. Unless we're trying to sync a very large number of URNs, this can be
> >
> > used to efficiently trim down the number of RTTs.
>
> Interesting. It seems like this is the responsibility of the application, though?

Some cases are better determined by the application, some can be automated. If
you're referring to the filtering specifically, I think it will require some
experimentation to find the right parameters: you need to probe the filter
element-wise and on top of that issue the corresponding RPCs. There is also a
limit to the number of fetches you can queue / execute concurrently.

So this becomes prohibitive with a large number of URNs to sync simultaneously,
but we need to find out what "large number" is approximately.


> Side question: does the bloom filter protect the privacy of the inventory of the sender of the
> bloom filter? Seems like a nice benefit of using a bloom filter, besides the efficiency.

In the sense that it is just a bunch of `u16`s, yes. The receiver needs to know
the preimage(s) to probe against the filter.

> >
> > This stuff is not particularly difficult to implement and wire up (modulo the
> >
>
> Ah, so by "wire up" you mean it would actually be part of the protocol? The reason
> this is important I think is that the idea that a peer "may have missed" a gossip
> seems hard to know for the application.

It is both I think. In the sense that applications will most definitely offer
some kind of "refresh button", or start a sync job after being offline for a
long time without blocking the main thread or whatever.

> I realize I have also forgotten to mention something, which I don't see covered here, at least
> not in a satisfying way: When the local user issues a file-system push via `git push rad`, the
> node may or may not be running. In the case where it isn't running, it seems like the announcement
> could happen when the node connects to other peers, via this grafting mechanism. But in the case
> where the node is online, I'd expect a mechanism to "detect" the new ref, and announce only that
> ref to all connected peers. Is there a plan for how this might work?

This was supposed to be solved by rfc0682 "Application Architecture", although
I'm not sure how much momentum this has at the moment.

The problem is essentially that coupling the lifecycles of a push-enabled git
server and the p2p node would require the equivalent of process management on
application level, which is quite a pain in the buttocks. It would also require
to be able to run a push-enabled git server in-process, but that's a different
problem. We thus figured that we would be better off leaving process management
to the system(d). Which means that we need some kind of message passing between
processes.

So what we need is for the p2p node (eg. `linkd`) to accept some kind of event
on a domain socket, which can be triggered from a `post-receive` hook or
equivalent.
Details
Message ID
<20211115141521.GE96379@schmidt.localdomain>
In-Reply-To
<024d07eb870f7be03eaaf7b118699d6b49285108.camel@axiom.fm> (view parent)
DKIM signature
pass
Download raw message
On Mon, 15 Nov 2021 12:17:34 +0100 Thomas Scholtes <thomas@axiom.fm> wrote:
> The biggest priority for us is grafting to make replication more reliable.

Ya, I think replication v3 comes even before that. Unfortunately, this is
dragging on as some of the building blocks turn out to be square, while others
are round :/
Reply to thread Export thread (mbox)