~radicle-link/discuss

RFC: Storage Hooks v1 PROPOSED

Here's a very small late night RFC patch for storage hooks, which
hopefully address our requirements in a platform independent, low config
way.

Alex Good (1):
  Add an RFC for storage hooks

 docs/rfc/0703-storage-hooks.adoc | 75 ++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 docs/rfc/0703-storage-hooks.adoc

-- 
2.35.1
Yeah, so I think there are two levels here:

1) a generalized way for a process to listen to changes to the monorepo
2) a way for a process to listen to changes made by linkd

For our usecase, (2) is definitely enough, since in our controlled environment (seed node),
we know that monorepo modifications can only happen via `git-server` and `linkd`.
So all we need, is to know when `linkd` has made a change we're not aware of.
I would question whether (1) is a good idea at all, or rather would be curious of usecases for it.
I'd imagine that if we had (2), modifications could be made through linkd, and whatever event system
linkd has could then be used.
One last point about (2), if we wanted to use it more generally, eg. multiple writers and readers,
I guess we'd need to consider doing all modifications of the monorepo state through linkd, which
may not be practical...

------- Original Message -------
I agree with the "single source of events" principle. The reason we only need linkd to be a
source of events is that we are making the other on-disk modifications, so we are already
aware of them. At the moment this happens inside a `post-receive` hook installed in the
monorepo. If we move to librad's `request-pull` though, I guess *all* such modifications
(updates to code refs) will go through linkd, and then the problem is solved and we just
have a single code path. Otherwise, if for whatever reason we have to maintain our plain
git push, then we'd have these two code paths active:

* user does plain git push -> git-server -> post-receive hook -> update HEAD
* gossip message arrives in linkd -> `replicate` is called -> event is fired -> some process updates HEAD 


------- Original Message -------
Am Tuesday, dem 29.03.2022 um 14:32 +0200 schrieb Kim Altintop:
Next
Am Monday, dem 28.03.2022 um 16:03 +0000 schrieb Alex Good:
Next
If the URN also includes the path to the ref that has been updated then this is
indeed sufficient for our use case.
Next
I wonder, since we are blocked on this for migrating to replv3 from http-based git-push,
could we then rely solely on the linkd events in our seed nodes? It seems like the more
generic solution will be useful on the desktop, but the linkd based event stream is all
we need on the server. As far as I know there's already something like this via the
gossip::Applied event, though this event is consumable only in-process for now.

For us, there are two ways of interacting with p2p that I can imagine:
1. Use out-of-the-box linkd: this would require the ability to consume linkd events via some type of socket
2. Use a custom Peer loop embedded in some service we build (eg. how org-node works right now).

The ideal in terms of maintenance burden is (1), but this requires linkd events to be published outside of the process.
------- Original Message -------
Forgot to put the link

[1]: https://github.com/nodejs/node/blob/f8ca5dfea462d05c4fadd6a935f375a7aa71f8be/lib/dgram.js#L317
It turns out I was actually binding to the wildcard address. Which is a little
pointless, because that socket will receive all packets, including those sent to
the global scope address.

So well, the network interface needs to be known. Which is not really "zero
conf", no matter what method of conf we'd be choosing.

But then, none of the alternatives is, yet each comes with a bunch of other
hassle. I think just running a list of programs a la git hooks might be the
least easy to f*ck up, so I'd prefer that. One thing to consider would be that
those hook executables should probably expect 1.. events on stdin, so we can at
least apply some buffering.



          
          
          
        
      

      
      
      
      

      

      
      
      
      

      

      
      
      
      
      
      

      

      
      
      
      

      

      
      
      
      

      

      
      
      
      
      
      

      

      
      
      
      
      
      

      
      
        
      
      
        
          






I think we said we'd x-post RFCs between dev and discuss too :)



          
          
          
        
      
      
        
          






Published-at: https://github.com/alexjg/radicle-link/tree/patches/rfc/storage-hooks/v2

Changes
* Include old and new OID in urn_changed hooks
* Include config target (peer id or default) and old and new config OIDs
  in tracking change notification

Range diff from v1

1:  5c2987dc ! 1:  04c83e2e Add an RFC for storage hooks
    @@ docs/rfc/0703-storage-hooks.adoc (new)
     +
     +Notifications of changes in the storage are delivered via "hooks", which are
     +similar in spirit to git hooks. A hook is an executable placed in
    -+`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is one of:
    ++`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is a directory named one
    ++of:
     +
     +* `urn_changed`
     +* `tracking_changed`
    @@ docs/rfc/0703-storage-hooks.adoc (new)
     +
     +For each hook type the notifying process MUST iterate over each executable in
     +the hook directory and call the executable with the arguments specified in this
    -+document. The arguments for a hook are base53-z encoded URNs separated by
    -+newline characters.
    -+
    -+Note that we specifically do not include information about the new state of the
    -+URN in the hook payload, this is because that state would not be reliable anyway
    -+and so applications should go to disk themselves if they need the new state.
    ++document. 
     +
     +Hook processes MUST continue to process events until they receive an end of
     +transmission character encoded as `0x04`. This allows calling processes to
    @@ docs/rfc/0703-storage-hooks.adoc (new)
     +=== URN changed hook
     +
     +Whenever a process makes a change that updates a ref under
    -+`refs/namespaces/<URN>/refs` the process MUST invoke the `urn_changed` hooks. The
    -+hook argument is the base32-z encoded URN of the identity which has changed.
    ++`refs/namespaces/<URN>/` the process MUST invoke the `urn_changed` hooks. The
    ++hook argument is the following:
    ++
    ++[source]
    ++----
    ++'rad:git' <urn> [<ref path>] SP <old-oid> SP <new-oid> LF
    ++----
    ++
    ++Where 
    ++* `<urn>` is the base32-z encoding of the URN
    ++* `<ref path>` is the ref in the scope of the URN namespace. I.e. everything
    ++  after `refs/namespaces/<URN>/`. 
    ++* `<old-oid>` is the OID the ref previously pointed at, this will be the zero OID
    ++  if the ref is being created
    ++* `<new-oid>` is the OID the ref previously pointed at, this will be the zero OID
    ++  if the ref is being deleted
    ++* `SP` is a single space character
    ++* `LF` is `\n`
    ++
    ++Note that the `ref-path` is optional and if it is empty then the notification
    ++refers to the entire namespace. Thus detecting newly created URNs is a question
    ++of waiting for notifications with an empty ref path and a non-zero `new-oid`.
     +
     +=== Tracking changed hook
     +
     +Whenever a process updates a ref under `refs/namespaces/<URN>/(default | <peer
     +id>)` the process MUST invoke the `tracking_changed` hooks. The hook argument is
    -+the base32-z encoded URN of the identity for which tracking has changed.
     +
    ++
    ++[source]
    ++----
    ++'rad:git' <urn> SP <peer-id> SP <old-oid> SP <new-oid> LF
    ++----
    ++
    ++Where
    ++* `<urn>` is the base32-z encoding of the URN
    ++* `<peer-id>` is either a peer ID or the string `default`.
    ++* `<old-oid>` is the OID of the previous tracking entry blob, this will be the zero
    ++  OID if the tracking entry is being created
    ++* `<new-oid>` is the OID of the new tracking entry blob, this will be the zero
    ++  OID if the tracking entry is being deleted
    ++* `SP` is a single space character
    ++* `LF` is `\n`


Alex Good (1):
  Add an RFC for storage hooks

 docs/rfc/0703-storage-hooks.adoc | 105 +++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)
 create mode 100644 docs/rfc/0703-storage-hooks.adoc
A few thoughts:
I think the wording could be clearer that `urn_changed` and `tracking_changed` are
directories. On the first read through I thought they were the executable names,
because that's how git hooks work.
I understand the reasoning here, though I do think for refs, having the previous hash
and next hash would help. It would also indicate whether something was deleted or created:

0 -> abc # created
abc -> 0 # deleted
abc -> xyz # updated

But this would put more burden on the notifying process. I also think the refname
that is being updated would be useful.
Not sure I understand this: when would the EOT character be sent? If you want to
terminate the transmission you can just kill the process?
You mean might there be additional directories in the future? I would
say that's likely, do you think it needs to be specified?
Next
I misunderstood the `<hook type>` being the binary. So, it does not
need to be defined in the RFC, but could you give me an idea of what
one of these binaries would do?

This also prompted the question: How are these hooks packaged? Do
people install them and they're placed under the hook type?

One more thing! I realise our own only protocol is currently only
`git`, but do you think we should scope the hooks to the protocol
itself, ie. `git` in this case. So perhaps the dir structure would be:

----
<MONOREPO_DIR>/hooks/<protocol>/<hook type>
----

Although, the protocol is in the URN, BUT, the parameters include OIDs
which are the content addresses of git objects, so I'm not 100% sure
whether one or the other makes sense. Just wanted to bring it up :)
Next
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~radicle-link/discuss/patches/30895/mbox | git am -3
Learn more about email & git

[PATCH 1/1] Add an RFC for storage hooks Export this patch

Signed-off-by: Alex Good <alex@memoryandthought.me>
---
 docs/rfc/0703-storage-hooks.adoc | 75 ++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 docs/rfc/0703-storage-hooks.adoc

diff --git a/docs/rfc/0703-storage-hooks.adoc b/docs/rfc/0703-storage-hooks.adoc
new file mode 100644
index 00000000..449b8a4e
--- /dev/null
+++ b/docs/rfc/0703-storage-hooks.adoc
@@ -0,0 +1,75 @@
= RFC: Storage Hooks
Alex Good <alex@memoryandthought.me>;
+
:revdate: 2022-04-07
:revremark: draft
:toc: preamble
:stem:

* Author: {author_1}
* Date: {revdate}
* Amended: {ammend_1}
* Status: {revremark}

== Motivation

There may be many processes which are interested in changes made to a link
monorepo. We would like to define a standard way for applications to notify each
other about changes to the monorepo.

== Terminology and Conventions

The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119,
https://www.rfc-editor.org/rfc/rfc2119>> and <<RFC8174,
https://www.rfc-editor.org/rfc/rfc8174>> when, and only when, they appear in all
capitals, as shown here.

== Hooks

Notifications of changes in the storage are delivered via "hooks", which are
similar in spirit to git hooks. A hook is an executable placed in
`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is one of:

* `urn_changed`
* `tracking_changed`

=== Calling a hook

For each hook type the notifying process MUST iterate over each executable in
the hook directory and call the executable with the arguments specified in this
document. The arguments for a hook are base53-z encoded URNs separated by
newline characters.

Note that we specifically do not include information about the new state of the
URN in the hook payload, this is because that state would not be reliable anyway
and so applications should go to disk themselves if they need the new state.

Hook processes MUST continue to process events until they receive an end of
transmission character encoded as `0x04`. This allows calling processes to
start a hook process once and then keep the process running until they need to
notify it again.

== Event Types

There are two event types

* Notifications that the data under a URN has changed in some way
* Notifications that the tracking config for a URN has changed

In both cases notifications are sent as a best effort and applications MUST NOT
assume that the current on-disk state matches the notification.

=== URN changed hook

Whenever a process makes a change that updates a ref under
`refs/namespaces/<URN>/refs` the process MUST invoke the `urn_changed` hooks. The
hook argument is the base32-z encoded URN of the identity which has changed.

=== Tracking changed hook

Whenever a process updates a ref under `refs/namespaces/<URN>/(default | <peer
id>)` the process MUST invoke the `tracking_changed` hooks. The hook argument is
the base32-z encoded URN of the identity for which tracking has changed.

-- 
2.35.1

[PATCH v2 1/1] Add an RFC for storage hooks Export this patch

Signed-off-by: Alex Good <alex@memoryandthought.me>
---
 docs/rfc/0703-storage-hooks.adoc | 105 +++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)
 create mode 100644 docs/rfc/0703-storage-hooks.adoc

diff --git a/docs/rfc/0703-storage-hooks.adoc b/docs/rfc/0703-storage-hooks.adoc
new file mode 100644
index 00000000..da3ed224
--- /dev/null
+++ b/docs/rfc/0703-storage-hooks.adoc
@@ -0,0 +1,105 @@
= RFC: Storage Hooks
Alex Good <alex@memoryandthought.me>;
+
:revdate: 2022-04-07
:revremark: draft
:toc: preamble
:stem:

* Author: {author_1}
* Date: {revdate}
* Amended: {ammend_1}
* Status: {revremark}

== Motivation

There may be many processes which are interested in changes made to a link
monorepo. We would like to define a standard way for applications to notify each
other about changes to the monorepo.

== Terminology and Conventions

The key words "`MUST`", "`MUST NOT`", "`REQUIRED`", "`SHALL`", "`SHALL NOT`",
"`SHOULD`", "`SHOULD NOT`", "`RECOMMENDED`", "`NOT RECOMMENDED`", "`MAY`", and
"`OPTIONAL`" in this document are to be interpreted as described in <<RFC2119,
https://www.rfc-editor.org/rfc/rfc2119>> and <<RFC8174,
https://www.rfc-editor.org/rfc/rfc8174>> when, and only when, they appear in all
capitals, as shown here.

== Hooks

Notifications of changes in the storage are delivered via "hooks", which are
similar in spirit to git hooks. A hook is an executable placed in
`<MONOREPO_DIR>/hooks/<hook type>`, where `hook type` is a directory named one
of:

* `urn_changed`
* `tracking_changed`

=== Calling a hook

For each hook type the notifying process MUST iterate over each executable in
the hook directory and call the executable with the arguments specified in this
document. 

Hook processes MUST continue to process events until they receive an end of
transmission character encoded as `0x04`. This allows calling processes to
start a hook process once and then keep the process running until they need to
notify it again.

== Event Types

There are two event types

* Notifications that the data under a URN has changed in some way
* Notifications that the tracking config for a URN has changed

In both cases notifications are sent as a best effort and applications MUST NOT
assume that the current on-disk state matches the notification.

=== URN changed hook

Whenever a process makes a change that updates a ref under
`refs/namespaces/<URN>/` the process MUST invoke the `urn_changed` hooks. The
hook argument is the following:

[source]
----
'rad:git' <urn> [<ref path>] SP <old-oid> SP <new-oid> LF
----

Where 
* `<urn>` is the base32-z encoding of the URN
* `<ref path>` is the ref in the scope of the URN namespace. I.e. everything
  after `refs/namespaces/<URN>/`. 
* `<old-oid>` is the OID the ref previously pointed at, this will be the zero OID
  if the ref is being created
* `<new-oid>` is the OID the ref previously pointed at, this will be the zero OID
  if the ref is being deleted
* `SP` is a single space character
* `LF` is `\n`

Note that the `ref-path` is optional and if it is empty then the notification
refers to the entire namespace. Thus detecting newly created URNs is a question
of waiting for notifications with an empty ref path and a non-zero `new-oid`.

=== Tracking changed hook

Whenever a process updates a ref under `refs/namespaces/<URN>/(default | <peer
id>)` the process MUST invoke the `tracking_changed` hooks. The hook argument is


[source]
----
'rad:git' <urn> SP <peer-id> SP <old-oid> SP <new-oid> LF
----

Where
* `<urn>` is the base32-z encoding of the URN
* `<peer-id>` is either a peer ID or the string `default`.
* `<old-oid>` is the OID of the previous tracking entry blob, this will be the zero
  OID if the tracking entry is being created
* `<new-oid>` is the OID of the new tracking entry blob, this will be the zero
  OID if the tracking entry is being deleted
* `SP` is a single space character
* `LF` is `\n`
-- 
2.35.1