~radicle-link/dev

12 3

Transitive trackings and config

Details
Message ID
<20220218140926.GC36711@schmidt.localdomain>
DKIM signature
fail
Download raw message
DKIM signature: fail
Moving from [0] to having link-tracking evaluate tracking configs to decide
whether to skip some remote refs, I realise we have not discussed what to do
with transitive trackings. Or at least, I can't find it if we did.

Recall:

    Looking at the `remotes` key in the signed refs of direct trackings, a peer
    is supposed to replicate transitive peer-ids, up to a limit. For example:

    ---
    A: { remotes: { B: { C: { D: {}}}}}
    ---

    If peer E tracks peer A, it should replicate A, B, C.


However:

    We do not have any config for B and C, unless they also appear as direct
    trackings. We have one for A, though, and we retain the information where B
    and C came from (ie. they may appear multiple times in `remotes`).


So what do we do with those?

From the example above, one might be tempted to say: apply the config of A.
However, B (for example) might also appear in the `remotes` of Z, which has a
much more restrictive (or permissive) config than A.

At some point, we actually created nested remote tracking branches, mirroring
`remotes`, but that didn't fly, so for every peer id we have only one remote
tracking tree and need to decide how to populate it.

What's the solution to this? Find the most permissive config among the direct
trackings? How does one compute which one that is?



[0]: https://lists.sr.ht/~radicle-link/dev/%3C20220218083349.15119-1-kim%40eagain.st%3E
Details
Message ID
<CAH_DpYRK7-M4PHDT2sjjDpx1UDRVX_u1o7vt5=QG4vhci=b8=A@mail.gmail.com>
In-Reply-To
<20220218140926.GC36711@schmidt.localdomain> (view parent)
DKIM signature
missing
Download raw message
On 18/02/22 02:09pm, Kim Altintop wrote:
> What's the solution to this? Find the most permissive config among the direct
> trackings? How does one compute which one that is?

Hmm, to me it seems like we would actually want the _least_ permissive,
on the basis that more restriced tracking configs require more manual
intervention (with the current APIs) and so probably represent a more
significant user intent.

Maybe we also want some concept of specificity here? If we have

    A: {remotes: {B: {C: {D: {}}}}}
    Z: {remotes: {C: {D: {E: {}}}}}

And we're trying to determine the config for C, then it feels like the
config for Z should override that of A?

Is actually computing this a case of enumerating all the direct tracking
configs we have, then loading the tracking graph for each of those
configs, then merging the configs for transitively tracked peers (with
some rule about merging two configs taking into account specificity and
permissivity)? This seems like it might need an index of some kind.
Details
Message ID
<20220218192009.GB93697@schmidt.localdomain>
In-Reply-To
<CAH_DpYRK7-M4PHDT2sjjDpx1UDRVX_u1o7vt5=QG4vhci=b8=A@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Fri, 18 Feb 2022 09:28:50 -0800 Alex Good <alex@memoryandthought.me> wrote:
> Hmm, to me it seems like we would actually want the _least_ permissive,
> on the basis that more restriced tracking configs require more manual
> intervention (with the current APIs) and so probably represent a more
> significant user intent.

Well, if A censors issue xyz, which B opened, then I think that's a problem.

> Maybe we also want some concept of specificity here? If we have
>
>     A: {remotes: {B: {C: {D: {}}}}}
>     Z: {remotes: {C: {D: {E: {}}}}}
>
> And we're trying to determine the config for C, then it feels like the
> config for Z should override that of A?

That's an interesting idea. If A and Z are direct trackings, then we imply some
level of trust. On one hand, this can help prevent unwanted content, on the
other hand it could censor wanted data in the grander scheme of things.

I would still say the more permissive of A and Z should be chosen.

> Is actually computing this a case of enumerating all the direct tracking
> configs we have, then loading the tracking graph for each of those
> configs

We do this already -- we need the signed refs for each peer we're going to
fetch, and the remotes are in there.

> then merging the configs for transitively tracked peers (with some rule about
> merging two configs taking into account specificity and permissivity)?

We don't have any configs for transitive trackings, because we don't replicate
the tracking database (which I think is a good thing). So we're looking for an
algorithm to compute the `Policy` from direct trackings plus maybe the distance
to the transitively tracked peer.

> This seems like it might need an index of some kind.

It's a revwalk plus a few blob accesses... hope that'll do for a while.
Details
Message ID
<447BEAC3-3992-4A04-9F91-CDEAD894599C@eagain.st>
In-Reply-To
<20220218192009.GB93697@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message

> It's a revwalk

A refwalk actually, haha. But still.
Details
Message ID
<2AA9434A-C34E-4364-8115-E01925B039AE@eagain.st>
In-Reply-To
<447BEAC3-3992-4A04-9F91-CDEAD894599C@eagain.st> (view parent)
DKIM signature
pass
Download raw message
I think the problem is that cobs should not actually be configured per-peer, but per-URN.
Details
Message ID
<20220221080240.GB3206@schmidt.localdomain>
In-Reply-To
<20220218140926.GC36711@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
I thought about this some more, and I would like to drop the feature, at least
for now.

Apart from the described difficulty to tell which config should take precedence,
it is also problematic to allow partial replication in general -- it is just
very difficult then to reason about which data should be propagated and which
should not.

We have so far allowed for the refs described by signed refs to be only
partially present on the remote end, but that has led to nearly impossible to
diagnose cases where refs were not being fetched when they were expected to. I'd
like to change that by requiring that either all refs in signed refs are served,
or none at all. This makes it impossible for additional predicates on cobs to
have an effect on replication, unless cobs would carry a per-ref signature.
Which I think is too complicated to implement right now, compared to the benefit
of being able to block certain cobs "client side".

We could consider to evaluate the config when evaluating cobs locally, so as to
support some kind of "unsubscribe" semantics which would just hide certain data.
But otherwise, the config should just describe whether to fetch only "refs/rad"
(sans "refs/rad/signed_refs"), or everything else, too.
Details
Message ID
<CI1MVKLE4QBX.1YJWS8P2CSAUW@haptop>
In-Reply-To
<20220221080240.GB3206@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Mon Feb 21, 2022 at 7:02 AM GMT, Kim Altintop wrote:
> I thought about this some more, and I would like to drop the feature, at least
> for now.
>
> Apart from the described difficulty to tell which config should take precedence,
> it is also problematic to allow partial replication in general -- it is just
> very difficult then to reason about which data should be propagated and which
> should not.
>
> We have so far allowed for the refs described by signed refs to be only
> partially present on the remote end, but that has led to nearly impossible to
> diagnose cases where refs were not being fetched when they were expected to. I'd
> like to change that by requiring that either all refs in signed refs are served,
> or none at all. This makes it impossible for additional predicates on cobs to
> have an effect on replication, unless cobs would carry a per-ref signature.
> Which I think is too complicated to implement right now, compared to the benefit
> of being able to block certain cobs "client side".
>
> We could consider to evaluate the config when evaluating cobs locally, so as to
> support some kind of "unsubscribe" semantics which would just hide certain data.
> But otherwise, the config should just describe whether to fetch only "refs/rad"
> (sans "refs/rad/signed_refs"), or everything else, too.

Ya, that's a fair point on the case of partial replication.

Initially, I thought the evaluation of cobs locally would feel weird
since you're still replicating data you might not care about but it
seems like a fair compromise. You can either untrack outright or you
get to hide some of the data -- with the bonus of being able to unhide
if you change your mind.

Should we amend the RFC to say that only `data` is considered for
replication filtering?
Details
Message ID
<20220221113128.GF3206@schmidt.localdomain>
In-Reply-To
<CI1MVKLE4QBX.1YJWS8P2CSAUW@haptop> (view parent)
DKIM signature
pass
Download raw message
If we keep the cobs config around, we still need to say what the precedence
rules are.

Where I think it does actually make sense to filter by peer: if I want to
unsubscribe from issue xyz, then it doesn't matter where it came from. If I want
to not unsubscribe, but silence peer C's contributions, then I think that's a
bit dangerous territory. If indeed peer C is a spammer, I could still track them
with `data: false`, so nothing from them comes through.

So that would suggest that the `cobs` filters are stored in the `default` ref,
and only there.
Details
Message ID
<CI1NQVWE9IRK.3LI0UA3QR7ATV@haptop>
In-Reply-To
<20220221113128.GF3206@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Mon Feb 21, 2022 at 10:31 AM GMT, Kim Altintop wrote:
> If we keep the cobs config around, we still need to say what the precedence
> rules are.
>
> Where I think it does actually make sense to filter by peer: if I want to
> unsubscribe from issue xyz, then it doesn't matter where it came from. If I want
> to not unsubscribe, but silence peer C's contributions, then I think that's a
> bit dangerous territory. If indeed peer C is a spammer, I could still track them
> with `data: false`, so nothing from them comes through.

Right, because `"cobs"` will only be replicated if `data: true` since
they're related to the signed refs, right?

A related question, if C is transitively tracked then we can't ignore
them completely, ie. untrack them, right? So setting `data: false` is
now our mechanism for ignoring them?

> So that would suggest that the `cobs` filters are stored in the `default` ref,
> and only there.

Ya, as you said, it should be per URN rather than per peer. This makes sense!
Details
Message ID
<20220221121212.GJ3206@schmidt.localdomain>
In-Reply-To
<CI1NQVWE9IRK.3LI0UA3QR7ATV@haptop> (view parent)
DKIM signature
pass
Download raw message
On Mon, 21 Feb 2022 10:37:30 +0000 "Fintan Halpenny" <fintan.halpenny@gmail.com> wrote:
> A related question, if C is transitively tracked then we can't ignore
> them completely, ie. untrack them, right? So setting `data: false` is
> now our mechanism for ignoring them?

Ya, that might be a bit odd. I'm not sure. The semantics would be that `rad/{id,
self, ids/*}` are still replicated, which might have some advantages.

We could also store a ban-list directly in the `default` config.
Details
Message ID
<CI1OJ34K0GXR.39N1UDMF88B6O@haptop>
In-Reply-To
<20220221121212.GJ3206@schmidt.localdomain> (view parent)
DKIM signature
pass
Download raw message
On Mon Feb 21, 2022 at 11:12 AM GMT, Kim Altintop wrote:
> On Mon, 21 Feb 2022 10:37:30 +0000 "Fintan Halpenny"
> <fintan.halpenny@gmail.com> wrote:
> > A related question, if C is transitively tracked then we can't ignore
> > them completely, ie. untrack them, right? So setting `data: false` is
> > now our mechanism for ignoring them?
>
> Ya, that might be a bit odd. I'm not sure. The semantics would be that
> `rad/{id,
> self, ids/*}` are still replicated, which might have some advantages.
>
> We could also store a ban-list directly in the `default` config.

As in a ban-list of peers? In a way, does that land us into partial
replication again?

I feel like replicating the `rad` data makes sense for network
integrity. If they're lying they're ignored anyway.
Details
Message ID
<CAH_DpYRRVWndO+UGkBSDh72Tv3C4BH1q-qbm4bhm3krPrH0gcQ@mail.gmail.com>
In-Reply-To
<20220221080240.GB3206@schmidt.localdomain> (view parent)
DKIM signature
missing
Download raw message
On 21/02/22 08:02am, Kim Altintop wrote:
> We could consider to evaluate the config when evaluating cobs locally, so as to
> support some kind of "unsubscribe" semantics which would just hide certain data.
> But otherwise, the config should just describe whether to fetch only "refs/rad"
> (sans "refs/rad/signed_refs"), or everything else, too.

The cobs codebase already does a lot of filtering of changes to do
things like ignore changes with invalid signatures, or to ignore changes
which violate the schema. It would be straightforward to extend that to
ignore things based on tracking configs. Would the tracking configs be
the right place to put this or would we prefer to put it somewhere else?
It isn't really about tracking any more.
Details
Message ID
<20220221140635.GB163900@schmidt.localdomain>
In-Reply-To
<CAH_DpYRRVWndO+UGkBSDh72Tv3C4BH1q-qbm4bhm3krPrH0gcQ@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Mon, 21 Feb 2022 04:58:57 -0800 Alex Good <alex@memoryandthought.me> wrote:
> On 21/02/22 08:02am, Kim Altintop wrote:
> > We could consider to evaluate the config when evaluating cobs locally, so as to
> > support some kind of "unsubscribe" semantics which would just hide certain data.
> > But otherwise, the config should just describe whether to fetch only "refs/rad"
> > (sans "refs/rad/signed_refs"), or everything else, too.
>
> The cobs codebase already does a lot of filtering of changes to do
> things like ignore changes with invalid signatures, or to ignore changes
> which violate the schema. It would be straightforward to extend that to
> ignore things based on tracking configs. Would the tracking configs be
> the right place to put this or would we prefer to put it somewhere else?
> It isn't really about tracking any more.

I don't mind, really, but then we need some other mutable place. Which is also
not visible to replicators.
Reply to thread Export thread (mbox)