Here's a very small late night RFC patch for storage hooks, which
hopefully address our requirements in a platform independent, low config
way.
Alex Good (1):
Add an RFC for storage hooks
docs/rfc/0703-storage-hooks.adoc | 75 ++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
create mode 100644 docs/rfc/0703-storage-hooks.adoc
--
2.35.1
Yeah, so I think there are two levels here:
1) a generalized way for a process to listen to changes to the monorepo
2) a way for a process to listen to changes made by linkd
For our usecase, (2) is definitely enough, since in our controlled environment (seed node),
we know that monorepo modifications can only happen via `git-server` and `linkd`.
So all we need, is to know when `linkd` has made a change we're not aware of.
Maybe you can explain what exactly the benefit is of knowing only modifications
made by linkd? Who would be listening on these events? And why does it not
matter for that program to not get notified of modifications by the git-server?
Conversely, I would not be willing to support an architecture where a consumer
has to subscribe to 2..* event sockets, and somehow coalesce events of the same
type. Not only is this annoying to implement, it also violates basic
architecture principles: it's none of the consumer's business to have
expectations about where a particular event type originated from.
I would question whether (1) is a good idea at all, or rather would be curious of usecases for it.
I'd imagine that if we had (2), modifications could be made through linkd, and whatever event system
linkd has could then be used.
linkd is currently fairly constrained in scope, it's basically a RPC
wrapper around librad::net::Peer and is only really interested in the
network. I think the expectation is that many applications which are not
interested in networking will want to directly write to storage (the
requirement to always go through a running daemon is part of what makes
IPFS so annoying to use IMO). The question then is whether `2` is still
sufficient in this case, maybe it is. The main usecases I think of are
interfaces which may want to show you if tracking configs have changed
or interfaces which want to show live updates to collaborative objects
as you make them (e.g. I add a comment via the CLI and it appears in my
editor).
One last point about (2), if we wanted to use it more generally, eg. multiple writers and readers,
I guess we'd need to consider doing all modifications of the monorepo state through linkd, which
may not be practical...
------- Original Message -------
I agree with the "single source of events" principle. The reason we only need linkd to be a
That's not what I said. It's about the source of the event being opaque to the
consumer.
source of events is that we are making the other on-disk modifications, so we are already
aware of them. At the moment this happens inside a `post-receive` hook installed in the
monorepo. If we move to librad's `request-pull` though, I guess *all* such modifications
(updates to code refs) will go through linkd, and then the problem is solved and we just
have a single code path. Otherwise, if for whatever reason we have to maintain our plain
That's not true in general. linkd needs to work for non-server setups as well,
and so can not be responsible for all state mutations; it is only running when
the node is participating in gossip. For example, one may want to adjust
trackings using the CLI, and Upstream (or a currently-running linkd, for that
matter) wants to get notified of that. I also assume that the org-node would not
be abandoned, even if it would offload lnk networking to linkd.
This is exactly one of the needs we have for events. Similarly, we would like to
know when a ref in a project is updated. This doesn’t only involve updates from
the gossip layer but also updates made by any kind of `rad push` or `git push
rad` command or other commands manipulating the monorepo.
Basically, this is what Alexis wrote
git push, then we'd have these two code paths active:
* user does plain git push -> git-server -> post-receive hook -> update HEAD
* gossip message arrives in linkd -> `replicate` is called -> event is fired -> some process updates HEAD
------- Original Message -------
Am Tuesday, dem 29.03.2022 um 14:32 +0200 schrieb Kim Altintop:
Am Monday, dem 28.03.2022 um 16:03 +0000 schrieb Alex Good:
If the URN also includes the path to the ref that has been updated then this is
indeed sufficient for our use case.
I wonder, since we are blocked on this for migrating to replv3 from http-based git-push,
could we then rely solely on the linkd events in our seed nodes? It seems like the more
generic solution will be useful on the desktop, but the linkd based event stream is all
we need on the server. As far as I know there's already something like this via the
gossip::Applied event, though this event is consumable only in-process for now.
For us, there are two ways of interacting with p2p that I can imagine:
1. Use out-of-the-box linkd: this would require the ability to consume linkd events via some type of socket
2. Use a custom Peer loop embedded in some service we build (eg. how org-node works right now).
The ideal in terms of maintenance burden is (1), but this requires linkd events to be published outside of the process.
What is your question here?
------- Original Message -------
Forgot to put the link
[1]: https://github.com/nodejs/node/blob/f8ca5dfea462d05c4fadd6a935f375a7aa71f8be/lib/dgram.js#L317
It turns out I was actually binding to the wildcard address. Which is a little
pointless, because that socket will receive all packets, including those sent to
the global scope address.
So well, the network interface needs to be known. Which is not really "zero
conf", no matter what method of conf we'd be choosing.
But then, none of the alternatives is, yet each comes with a bunch of other
hassle. I think just running a list of programs a la git hooks might be the
least easy to f*ck up, so I'd prefer that. One thing to consider would be that
those hook executables should probably expect 1.. events on stdin, so we can at
least apply some buffering.
So we would specify a directory in storage where people can stick
arbitrary binaries then any process which writes to storage must execute
those binaries?
I think we said we'd x-post RFCs between dev and discuss too :)
Ah apologies, didn't realise the patch was also in-reply-to the Peer
Events topic!
Published-at: https://github.com/alexjg/radicle-link/tree/patches/rfc/storage-hooks/v2
Changes
* Include old and new OID in urn_changed hooks
* Include config target (peer id or default) and old and new config OIDs
in tracking change notification
Range diff from v1
1: 5c2987dc ! 1: 04c83e2e Add an RFC for storage hooks
@@ docs/rfc/0703-storage-hooks.adoc (new)
+
+Notifications of changes in the storage are delivered via "hooks", which are
+similar in spirit to git hooks. A hook is an executable placed in
-+`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is one of:
++`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is a directory named one
++of:
+
+* `urn_changed`
+* `tracking_changed`
@@ docs/rfc/0703-storage-hooks.adoc (new)
+
+For each hook type the notifying process MUST iterate over each executable in
+the hook directory and call the executable with the arguments specified in this
-+document. The arguments for a hook are base53-z encoded URNs separated by
-+newline characters.
-+
-+Note that we specifically do not include information about the new state of the
-+URN in the hook payload, this is because that state would not be reliable anyway
-+and so applications should go to disk themselves if they need the new state.
++document.
+
+Hook processes MUST continue to process events until they receive an end of
+transmission character encoded as `0x04`. This allows calling processes to
@@ docs/rfc/0703-storage-hooks.adoc (new)
+=== URN changed hook
+
+Whenever a process makes a change that updates a ref under
-+`refs/namespaces/<URN>/refs` the process MUST invoke the `urn_changed` hooks. The
-+hook argument is the base32-z encoded URN of the identity which has changed.
++`refs/namespaces/<URN>/` the process MUST invoke the `urn_changed` hooks. The
++hook argument is the following:
++
++[source]
++----
++'rad:git' <urn> [<ref path>] SP <old-oid> SP <new-oid> LF
++----
++
++Where
++* `<urn>` is the base32-z encoding of the URN
++* `<ref path>` is the ref in the scope of the URN namespace. I.e. everything
++ after `refs/namespaces/<URN>/`.
++* `<old-oid>` is the OID the ref previously pointed at, this will be the zero OID
++ if the ref is being created
++* `<new-oid>` is the OID the ref previously pointed at, this will be the zero OID
++ if the ref is being deleted
++* `SP` is a single space character
++* `LF` is `\n`
++
++Note that the `ref-path` is optional and if it is empty then the notification
++refers to the entire namespace. Thus detecting newly created URNs is a question
++of waiting for notifications with an empty ref path and a non-zero `new-oid`.
+
+=== Tracking changed hook
+
+Whenever a process updates a ref under `refs/namespaces/<URN>/(default | <peer
+id>)` the process MUST invoke the `tracking_changed` hooks. The hook argument is
-+the base32-z encoded URN of the identity for which tracking has changed.
+
++
++[source]
++----
++'rad:git' <urn> SP <peer-id> SP <old-oid> SP <new-oid> LF
++----
++
++Where
++* `<urn>` is the base32-z encoding of the URN
++* `<peer-id>` is either a peer ID or the string `default`.
++* `<old-oid>` is the OID of the previous tracking entry blob, this will be the zero
++ OID if the tracking entry is being created
++* `<new-oid>` is the OID of the new tracking entry blob, this will be the zero
++ OID if the tracking entry is being deleted
++* `SP` is a single space character
++* `LF` is `\n`
Alex Good (1):
Add an RFC for storage hooks
docs/rfc/0703-storage-hooks.adoc | 105 +++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)
create mode 100644 docs/rfc/0703-storage-hooks.adoc
A few thoughts:
I think the wording could be clearer that `urn_changed` and `tracking_changed` are
directories. On the first read through I thought they were the executable names,
because that's how git hooks work.
I understand the reasoning here, though I do think for refs, having the previous hash
and next hash would help. It would also indicate whether something was deleted or created:
I'm minded to agree, see the sibling reply to Kim's email.
0 -> abc # created
abc -> 0 # deleted
abc -> xyz # updated
But this would put more burden on the notifying process. I also think the refname
that is being updated would be useful.
Not sure I understand this: when would the EOT character be sent? If you want to
terminate the transmission you can just kill the process?
You mean might there be additional directories in the future? I would
say that's likely, do you think it needs to be specified?
I misunderstood the `<hook type>` being the binary. So, it does not
need to be defined in the RFC, but could you give me an idea of what
one of these binaries would do?
This also prompted the question: How are these hooks packaged? Do
people install them and they're placed under the hook type?
Yep, that's the thinking. Not ideal but easier than all the alternatives
unfortunately.
One more thing! I realise our own only protocol is currently only
`git`, but do you think we should scope the hooks to the protocol
itself, ie. `git` in this case. So perhaps the dir structure would be:
----
<MONOREPO_DIR>/hooks/<protocol>/<hook type>
----
Although, the protocol is in the URN, BUT, the parameters include OIDs
which are the content addresses of git objects, so I'm not 100% sure
whether one or the other makes sense. Just wanted to bring it up :)