Hey everyone,
After spending way too long on building an "ultra simple" database
(hey, it was fun!) I've decided thanks to some gentle thoughts on this
list to throw it away and rethink how I would architect a version
control. This is good, I think the design is now both simpler and
cleaner!
https://lua.civboot.org#Package_flux
See the above link. I'd appreciate any and all feedback. I'm going to
(again) be taking a short break before going full-tilt on implementing
the thing, but I'm _really_ excited to build this. It has MANY
benefits, but one of the main ones (that I think Virgil and y'all will
really like) is that you could probably trivially implement the core
functionality with a few bash functions (or FORTH with a working
diff/patch function), though I still think Lua is better for usability
(hey, why not both?)
Best,
Rett
I like it! It does just about everything I do with git.
I don't have much experience with version control, but the design feels good.
Implementing branches as directories is not just simple, it makes it trivial
to compare between branches, and to jump from one branch to another.
Maybe git does too much while trying to preserve the illusion of a single
directory for the project?
On Mon, Feb 10, 2025 at 09:43:50PM -0700, Rett Berg wrote:
> Hey everyone,> > After spending way too long on building an "ultra simple" database> (hey, it was fun!) I've decided thanks to some gentle thoughts on this> list to throw it away and rethink how I would architect a version> control. This is good, I think the design is now both simpler and> cleaner!> > https://lua.civboot.org#Package_flux> > See the above link. I'd appreciate any and all feedback. I'm going to> (again) be taking a short break before going full-tilt on implementing> the thing, but I'm _really_ excited to build this. It has MANY> benefits, but one of the main ones (that I think Virgil and y'all will> really like) is that you could probably trivially implement the core> functionality with a few bash functions (or FORTH with a working> diff/patch function), though I still think Lua is better for usability> (hey, why not both?)> > Best,> Rett
It looks simple and if it works, it will be great. But I can't help thinking
that there's going to be a showstopper at some point. I haven't spent time
thinking about it much, but it sounds too good to be true.
If I try to remember what was said when git came out, one of the big problems
it solved was efficiently handling merge conflicts. Effectively, I have bad
memories of doing branch merges in SVN, enough that I remember actively
avoiding branches. Today, we take painless branch merging as granted, but I
think there might be significant inherent complexity involved in this absence
of pain.
All that being said, I'm curious to see what you come up with :)
Regards,
Virgil
I work at Google and _everything_ that over 100,000+ software
engineers work on in a multi-billion line codebase is continuously
rebased onto a single repository by incrementing IDs -- there are
literally hundreds of thousands of patches made every day and there is
hardly ever test or build breakages (and when they are people know
about them quickly and they are fixed quickly). Google does all
development on a single trunk from which engineers constantly
branch->rebase->push to. It's a modern marvel of underlying
simplicity. The SCALE is only possible because of Google's awesome
build and test infrastructure, as well as constant maintenance
operations and a few other factors. I'm not saying it's somehow
trivial for this model to scale but it definitely CAN scale -- and for
a small project a simple presubmit test suite is probably sufficient
for it to scale to 10+ developers.
> Sidenote: there are a few servies which make "release branches" but IIUC they are just branches from the root at a change number that never get re-merged but can have cherry picks applied to them. Regardless they are extremely rare and only a few core-infrastructure teams use them.
On the git side, branch merging has bitten people as well and now many
teams/companies use a model that requires rebase-only -- which
effectively results in a single long tree.
In my mind this model has been tried and tested. Could you "scale"
this specific software -- where you store all patches as files
appended to a tar file? Probably not -- but then again, that is just
an implementation detail: the API/Architecture is still just a series
of patches and is fundamentally identical to what Google is doing. If
you need to scale to 100+ or 1000+ engineers it's probably worth using
a database with some helpful O(1) or O(log n) algorithms to solve some
of these problems for your root branch. However, _even in this case_
local development could still use the files and tar archives!
Sidenote: I've updated the architecture with a "tags" file for
specifying versions/etc as well as
https://lua.civboot.org#flux.collapse with documentation on how
patches can be collapsed when doing repo maintenance.
Best,
Rett
On Tue, Feb 11, 2025 at 6:46 AM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> On Mon, Feb 10, 2025 at 09:43:50PM -0700, Rett Berg wrote:> > Hey everyone,> >> > After spending way too long on building an "ultra simple" database> > (hey, it was fun!) I've decided thanks to some gentle thoughts on this> > list to throw it away and rethink how I would architect a version> > control. This is good, I think the design is now both simpler and> > cleaner!> >> > https://lua.civboot.org#Package_flux> >> > See the above link. I'd appreciate any and all feedback. I'm going to> > (again) be taking a short break before going full-tilt on implementing> > the thing, but I'm _really_ excited to build this. It has MANY> > benefits, but one of the main ones (that I think Virgil and y'all will> > really like) is that you could probably trivially implement the core> > functionality with a few bash functions (or FORTH with a working> > diff/patch function), though I still think Lua is better for usability> > (hey, why not both?)> >> > Best,> > Rett>> It looks simple and if it works, it will be great. But I can't help thinking> that there's going to be a showstopper at some point. I haven't spent time> thinking about it much, but it sounds too good to be true.>> If I try to remember what was said when git came out, one of the big problems> it solved was efficiently handling merge conflicts. Effectively, I have bad> memories of doing branch merges in SVN, enough that I remember actively> avoiding branches. Today, we take painless branch merging as granted, but I> think there might be significant inherent complexity involved in this absence> of pain.>> All that being said, I'm curious to see what you come up with :)>> Regards,> Virgil
To concur with Rett here, I think there is a totally coherent software
development model using a tool like Flux which would work at scale and
still be an improvement over tools like CVS/Subversion where merging
branches was awful.
I would propose something even simpler: when rebasing a branch onto
main, you first need to collapse the commits from the branch point
into a single patch (commit) and then apply that at the end. From the
main branch point of view merging a branch and apply a third-party
patch would look the same, and dealing with conflicts would work about
the same in either case.
It's not clear that being able to do a "zipper merge" of two branches
is ever necessary in practice. It can be convenient but it complicates
the mental model, makes conflicts much harder to deal with, can be
confusing to visualize, and requires more code.
I haven't fully understood all the internals of Flux yet, but I think
a simple patch-based system that can commit changes, support
branching, boil commits down into a single commit, and apply commits
from one branch onto another should be sufficient for almost all
useful work.
-- d_m
On Tue, Feb 11, 2025 at 11:32:50AM -0500, d_m wrote:
> To concur with Rett here, I think there is a totally coherent software> development model using a tool like Flux which would work at scale and> still be an improvement over tools like CVS/Subversion where merging> branches was awful.> > I would propose something even simpler: when rebasing a branch onto> main, you first need to collapse the commits from the branch point> into a single patch (commit) and then apply that at the end. From the> main branch point of view merging a branch and apply a third-party> patch would look the same, and dealing with conflicts would work about> the same in either case.> > It's not clear that being able to do a "zipper merge" of two branches> is ever necessary in practice. It can be convenient but it complicates> the mental model, makes conflicts much harder to deal with, can be> confusing to visualize, and requires more code.> > I haven't fully understood all the internals of Flux yet, but I think> a simple patch-based system that can commit changes, support> branching, boil commits down into a single commit, and apply commits> from one branch onto another should be sufficient for almost all> useful work.> > -- d_m
Well then, you got me all excited there. Go Rett!
Didn't anyone try anything of that sort before?
Regards,
Virgil
Thanks d_m and Virgil
> I would propose something even simpler: when rebasing a branch onto
main, you first need to collapse the commits from the branch point
into a single patch (commit) and then apply that at the end. From the
main branch point of view merging a branch and apply a third-party
patch would look the same, and dealing with conflicts would work about
the same in either case.
I would do it the same model as Google: when working locally you can
chain patches but when merging you can only ever push a single patch
at a time -- and that patch must completely pass tests before it's
allowed to be merged. Rebasing is done by cherry picking each of your
branch changes one-at-a-time to the top of the trunk (requiring manual
resolution for conflicts).
Obviously Google does things a _bit_ differently at scale: I think
each merge request is batched by the affected files and it
concurrently merges multiple changes when there are zero path
conflicts (each is still assigned a unique id). Something like that.
Regardless, the mental model is the same and at small scale with fast
tests the remote branch could LITERALLY test+merge a single patch at a
time.
Regarding "boil commits down into a single commit", that's pretty much
what https://lua.civboot.org#flux.collapse does, even for the remote
branch. Locally you would have the additional ability to "fold"
multiple patches into a single commit or even modify a patch (then
cherry pick your later patches on top, what's called "evolve" at
Google), since locally nobody is depending on the specific patch ids
or their hashes.
These are all "just features" though. You can do all sorts of nice
features for working with local branches, reverting to prior branch
states, recovering from failed merges, etc. I want a lot of these
features to exist and they should be relatively easy to implement
because of the freedom you can have for your local branches -- if/when
things go horribly wrong you should always be able to just revert to a
timestamped snapshot.
Best,
Rett
On Tue, Feb 11, 2025 at 10:43 AM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> On Tue, Feb 11, 2025 at 11:32:50AM -0500, d_m wrote:> > To concur with Rett here, I think there is a totally coherent software> > development model using a tool like Flux which would work at scale and> > still be an improvement over tools like CVS/Subversion where merging> > branches was awful.> >> > I would propose something even simpler: when rebasing a branch onto> > main, you first need to collapse the commits from the branch point> > into a single patch (commit) and then apply that at the end. From the> > main branch point of view merging a branch and apply a third-party> > patch would look the same, and dealing with conflicts would work about> > the same in either case.> >> > It's not clear that being able to do a "zipper merge" of two branches> > is ever necessary in practice. It can be convenient but it complicates> > the mental model, makes conflicts much harder to deal with, can be> > confusing to visualize, and requires more code.> >> > I haven't fully understood all the internals of Flux yet, but I think> > a simple patch-based system that can commit changes, support> > branching, boil commits down into a single commit, and apply commits> > from one branch onto another should be sufficient for almost all> > useful work.> >> > -- d_m>> Well then, you got me all excited there. Go Rett!>> Didn't anyone try anything of that sort before?>> Regards,> Virgil
(resending... stupid gmail and plain text mode)
> Didn't anyone try anything of that sort before?
I'm not sure -- I've certainly never seen it (except for Google
obviously, but I think the simplicity is hidden behind the gargantuan
scale).
Would certainly be nice to know if something similar exists already!
I'm also thinking of alternative names to flux, does any of these
sound good to anyone:
flux: sounds super techy, also is a "flow" like thing which is what a
change list is.
patches: and the mascot could be a cute puppy chewing on a patched-up
scarf around its neck! Refers to it being "just a series of patch
commands.
pvc: patch version control, a name that is similar in spirit to
cvs/etc while also referring to "PVC Pipe" aka fluid flow/etc
I think patches is cuter but I shy away from it because I'm not sure
what the CLI command would be... pch? pchs? I'm starting to lean
towards "pvc".
Best,
Rett
On Tue, Feb 11, 2025 at 11:09 AM Rett Berg <googberg@gmail.com> wrote:
>> Thanks d_m and Virgil>> > I would propose something even simpler: when rebasing a branch onto> main, you first need to collapse the commits from the branch point> into a single patch (commit) and then apply that at the end. From the> main branch point of view merging a branch and apply a third-party> patch would look the same, and dealing with conflicts would work about> the same in either case.>> I would do it the same model as Google: when working locally you can> chain patches but when merging you can only ever push a single patch> at a time -- and that patch must completely pass tests before it's> allowed to be merged. Rebasing is done by cherry picking each of your> branch changes one-at-a-time to the top of the trunk (requiring manual> resolution for conflicts).>> Obviously Google does things a _bit_ differently at scale: I think> each merge request is batched by the affected files and it> concurrently merges multiple changes when there are zero path> conflicts (each is still assigned a unique id). Something like that.> Regardless, the mental model is the same and at small scale with fast> tests the remote branch could LITERALLY test+merge a single patch at a> time.>> Regarding "boil commits down into a single commit", that's pretty much> what https://lua.civboot.org#flux.collapse does, even for the remote> branch. Locally you would have the additional ability to "fold"> multiple patches into a single commit or even modify a patch (then> cherry pick your later patches on top, what's called "evolve" at> Google), since locally nobody is depending on the specific patch ids> or their hashes.>> These are all "just features" though. You can do all sorts of nice> features for working with local branches, reverting to prior branch> states, recovering from failed merges, etc. I want a lot of these> features to exist and they should be relatively easy to implement> because of the freedom you can have for your local branches -- if/when> things go horribly wrong you should always be able to just revert to a> timestamped snapshot.>> Best,> Rett>> On Tue, Feb 11, 2025 at 10:43 AM Virgil Dupras <hsoft@hardcoded.net> wrote:> >> > On Tue, Feb 11, 2025 at 11:32:50AM -0500, d_m wrote:> > > To concur with Rett here, I think there is a totally coherent software> > > development model using a tool like Flux which would work at scale and> > > still be an improvement over tools like CVS/Subversion where merging> > > branches was awful.> > >> > > I would propose something even simpler: when rebasing a branch onto> > > main, you first need to collapse the commits from the branch point> > > into a single patch (commit) and then apply that at the end. From the> > > main branch point of view merging a branch and apply a third-party> > > patch would look the same, and dealing with conflicts would work about> > > the same in either case.> > >> > > It's not clear that being able to do a "zipper merge" of two branches> > > is ever necessary in practice. It can be convenient but it complicates> > > the mental model, makes conflicts much harder to deal with, can be> > > confusing to visualize, and requires more code.> > >> > > I haven't fully understood all the internals of Flux yet, but I think> > > a simple patch-based system that can commit changes, support> > > branching, boil commits down into a single commit, and apply commits> > > from one branch onto another should be sufficient for almost all> > > useful work.> > >> > > -- d_m> >> > Well then, you got me all excited there. Go Rett!> >> > Didn't anyone try anything of that sort before?> >> > Regards,> > Virgil
I've renamed it "PVC" for "Patch Version Control". I also think
"Patches" should be a mascot -- if anyone wants to make a cute puppy
chewing on a PVC pipe (preferably in SVG or similar) I would be
honored.
I've started playing with unix diff / patch, the results are promising
https://github.com/civboot/civlua/blob/pvc/cmd/pvc/testdata/notes.shhttps://raw.githubusercontent.com/civboot/civlua/refs/heads/pvc/cmd/pvc/testdata/notes.sh
I have some test data in that same directory and that script (when
PWD=.../civlua/) does the logic I'm planning to do for the version
control. The main commands are the following
This below is how you "create" story.txt. I'm intentionally making the
path/label different since when the PVC runs it they will be very
different. /dev/null seems to be the best way to mark "was or will be
empty"
diff -N --unified=1 /dev/null story.txt.1 --label=/dev/null
--label=story.txt \
> patch.story.txt.1
--- /dev/null
+++ story.txt
@@ -0,0 +1,4 @@
+# Story
+This is a story
+about a man
+and his dog.
In the test script, this is merged with an example lua script to be a
singe "patch file." You can execute all the commands with something
like:
cmd/pvc/testdata/notes.sh create1 # create patch.1
cmd/pvc/testdata/notes.sh create1 # create patch.2
cmd/pvc/testdata/notes.sh patch1 # initialize .out/pvc/ and apply patch.1
cmd/pvc/testdata/notes.sh patch2 # apply patch.2 to .out/pvc/
So far unix seems to be doing the right thing. I've even implemented
"patch2_1" which reverts the change from patch2 to go back to patch1
(I didn't think this would be so easy!). The basic patch command is:
cat $TD/patch.2 | patch -Nu # apply patch "forward"
cat $TD/patch.2 | patch -Ru # apply patch in "reverse"
It's my first time seriously using these tools -- let me know if
anyone knows some tips or tricks to do things "better" or "more
standardized" or what have you. The manpages are awash with "with such
and such a compliance bit" blah blah -- if anyone knows a good way to
make things "just work" I'd love to hear it.
Best
- Rett
On Tue, Feb 11, 2025 at 11:16 AM Rett Berg <googberg@gmail.com> wrote:
>> (resending... stupid gmail and plain text mode)>> > Didn't anyone try anything of that sort before?>> I'm not sure -- I've certainly never seen it (except for Google> obviously, but I think the simplicity is hidden behind the gargantuan> scale).>> Would certainly be nice to know if something similar exists already!>> I'm also thinking of alternative names to flux, does any of these> sound good to anyone:> flux: sounds super techy, also is a "flow" like thing which is what a> change list is.> patches: and the mascot could be a cute puppy chewing on a patched-up> scarf around its neck! Refers to it being "just a series of patch> commands.> pvc: patch version control, a name that is similar in spirit to> cvs/etc while also referring to "PVC Pipe" aka fluid flow/etc> I think patches is cuter but I shy away from it because I'm not sure> what the CLI command would be... pch? pchs? I'm starting to lean> towards "pvc".>> Best,> Rett>>> On Tue, Feb 11, 2025 at 11:09 AM Rett Berg <googberg@gmail.com> wrote:> >> > Thanks d_m and Virgil> >> > > I would propose something even simpler: when rebasing a branch onto> > main, you first need to collapse the commits from the branch point> > into a single patch (commit) and then apply that at the end. From the> > main branch point of view merging a branch and apply a third-party> > patch would look the same, and dealing with conflicts would work about> > the same in either case.> >> > I would do it the same model as Google: when working locally you can> > chain patches but when merging you can only ever push a single patch> > at a time -- and that patch must completely pass tests before it's> > allowed to be merged. Rebasing is done by cherry picking each of your> > branch changes one-at-a-time to the top of the trunk (requiring manual> > resolution for conflicts).> >> > Obviously Google does things a _bit_ differently at scale: I think> > each merge request is batched by the affected files and it> > concurrently merges multiple changes when there are zero path> > conflicts (each is still assigned a unique id). Something like that.> > Regardless, the mental model is the same and at small scale with fast> > tests the remote branch could LITERALLY test+merge a single patch at a> > time.> >> > Regarding "boil commits down into a single commit", that's pretty much> > what https://lua.civboot.org#flux.collapse does, even for the remote> > branch. Locally you would have the additional ability to "fold"> > multiple patches into a single commit or even modify a patch (then> > cherry pick your later patches on top, what's called "evolve" at> > Google), since locally nobody is depending on the specific patch ids> > or their hashes.> >> > These are all "just features" though. You can do all sorts of nice> > features for working with local branches, reverting to prior branch> > states, recovering from failed merges, etc. I want a lot of these> > features to exist and they should be relatively easy to implement> > because of the freedom you can have for your local branches -- if/when> > things go horribly wrong you should always be able to just revert to a> > timestamped snapshot.> >> > Best,> > Rett> >> > On Tue, Feb 11, 2025 at 10:43 AM Virgil Dupras <hsoft@hardcoded.net> wrote:> > >> > > On Tue, Feb 11, 2025 at 11:32:50AM -0500, d_m wrote:> > > > To concur with Rett here, I think there is a totally coherent software> > > > development model using a tool like Flux which would work at scale and> > > > still be an improvement over tools like CVS/Subversion where merging> > > > branches was awful.> > > >> > > > I would propose something even simpler: when rebasing a branch onto> > > > main, you first need to collapse the commits from the branch point> > > > into a single patch (commit) and then apply that at the end. From the> > > > main branch point of view merging a branch and apply a third-party> > > > patch would look the same, and dealing with conflicts would work about> > > > the same in either case.> > > >> > > > It's not clear that being able to do a "zipper merge" of two branches> > > > is ever necessary in practice. It can be convenient but it complicates> > > > the mental model, makes conflicts much harder to deal with, can be> > > > confusing to visualize, and requires more code.> > > >> > > > I haven't fully understood all the internals of Flux yet, but I think> > > > a simple patch-based system that can commit changes, support> > > > branching, boil commits down into a single commit, and apply commits> > > > from one branch onto another should be sufficient for almost all> > > > useful work.> > > >> > > > -- d_m> > >> > > Well then, you got me all excited there. Go Rett!> > >> > > Didn't anyone try anything of that sort before?> > >> > > Regards,> > > Virgil
So to Virgil's question "has anybody does this" -- these tools (such
as merge) are commonly packaged into the "RCS" package, the docs for
which are here:
https://www.gnu.org/software/rcs/manual/rcs.html
Which state:
> RCS works with versions stored on a single filesystem or machine, edited by one person at a time. Other version control systems, such as Bazaar (http:///www.gnu.org/software/bazaar), CVS, Subversion, and Git, support distributed access in various ways. Which is more appropriate depends on the task at hand.
After reading through its tutorial I don't think I'm likely to use it
personally.
Apparently CVS[1] is intended to be an "RCS Frontend". I downloaded a
5 yr old github mirror[2] and tokei reports that the src/ directory
has 110,000 lines of code with 267,265 lines in the whole thing. I
have not looked into this, but CVS's wikipedia talks about it being a
"server"-- maybe they took that too far and it added bloat?
RCS never mentions a (smol) spiritual ancestor. Why does software,
when it grows organically, commonly go from something like RCS to
become something like CVS; instead of RCS becoming something like PVC
(my project)? Did something try but then (for some reason) abandon a
simpler approach? Why was there so much adoption of CVS without
alternatives considered first?
Minor updates:
* I tested using the RCS "merge" command for rebase/cherrypick and it
works -- I believe I can now implement all the features I'm currently
planning.
* I am now using patch --input= instead of passing by stdin. Also I'm
doing diff --unified=0 since context is not necessary for data storage
and merging doesn't use diffs anyway.
* I decided to store patches in a directory hierarchy which allows
only 100 items per level, for instance patch 12,432 might be
.pvc/mybranch/patches/01/24/32/12432.p. This has several benefits:
* old filesystems had trouble with lots of nodes, which DuskOS
_could_ be. 100 is small enough to be manageable.
* it makes it much faster to search for "patches near X".
* you can "upgrade" this directory structure nearly atomically by
just moving the patches/ directory into the next level.
* I actually believe this could scale to be useable for even a
relatively large project -- that's the goal anyway.
Best
- Rett
[1]: https://en.wikipedia.org/wiki/Concurrent_Versions_System
[2]: https://github.com/Aalbus-linux/cvs
On Fri, Feb 14, 2025 at 4:16 PM Rett Berg <googberg@gmail.com> wrote:
>> I've renamed it "PVC" for "Patch Version Control". I also think> "Patches" should be a mascot -- if anyone wants to make a cute puppy> chewing on a PVC pipe (preferably in SVG or similar) I would be> honored.>> I've started playing with unix diff / patch, the results are promising>> https://github.com/civboot/civlua/blob/pvc/cmd/pvc/testdata/notes.sh> https://raw.githubusercontent.com/civboot/civlua/refs/heads/pvc/cmd/pvc/testdata/notes.sh>> I have some test data in that same directory and that script (when> PWD=.../civlua/) does the logic I'm planning to do for the version> control. The main commands are the following>> This below is how you "create" story.txt. I'm intentionally making the> path/label different since when the PVC runs it they will be very> different. /dev/null seems to be the best way to mark "was or will be> empty">> diff -N --unified=1 /dev/null story.txt.1 --label=/dev/null> --label=story.txt \> > patch.story.txt.1>> --- /dev/null> +++ story.txt> @@ -0,0 +1,4 @@> +# Story> +This is a story> +about a man> +and his dog.>> In the test script, this is merged with an example lua script to be a> singe "patch file." You can execute all the commands with something> like:>> cmd/pvc/testdata/notes.sh create1 # create patch.1> cmd/pvc/testdata/notes.sh create1 # create patch.2>> cmd/pvc/testdata/notes.sh patch1 # initialize .out/pvc/ and apply patch.1> cmd/pvc/testdata/notes.sh patch2 # apply patch.2 to .out/pvc/>> So far unix seems to be doing the right thing. I've even implemented> "patch2_1" which reverts the change from patch2 to go back to patch1> (I didn't think this would be so easy!). The basic patch command is:>> cat $TD/patch.2 | patch -Nu # apply patch "forward"> cat $TD/patch.2 | patch -Ru # apply patch in "reverse">> It's my first time seriously using these tools -- let me know if> anyone knows some tips or tricks to do things "better" or "more> standardized" or what have you. The manpages are awash with "with such> and such a compliance bit" blah blah -- if anyone knows a good way to> make things "just work" I'd love to hear it.>> Best> - Rett>> On Tue, Feb 11, 2025 at 11:16 AM Rett Berg <googberg@gmail.com> wrote:> >> > (resending... stupid gmail and plain text mode)> >> > > Didn't anyone try anything of that sort before?> >> > I'm not sure -- I've certainly never seen it (except for Google> > obviously, but I think the simplicity is hidden behind the gargantuan> > scale).> >> > Would certainly be nice to know if something similar exists already!> >> > I'm also thinking of alternative names to flux, does any of these> > sound good to anyone:> > flux: sounds super techy, also is a "flow" like thing which is what a> > change list is.> > patches: and the mascot could be a cute puppy chewing on a patched-up> > scarf around its neck! Refers to it being "just a series of patch> > commands.> > pvc: patch version control, a name that is similar in spirit to> > cvs/etc while also referring to "PVC Pipe" aka fluid flow/etc> > I think patches is cuter but I shy away from it because I'm not sure> > what the CLI command would be... pch? pchs? I'm starting to lean> > towards "pvc".> >> > Best,> > Rett> >> >> > On Tue, Feb 11, 2025 at 11:09 AM Rett Berg <googberg@gmail.com> wrote:> > >> > > Thanks d_m and Virgil> > >> > > > I would propose something even simpler: when rebasing a branch onto> > > main, you first need to collapse the commits from the branch point> > > into a single patch (commit) and then apply that at the end. From the> > > main branch point of view merging a branch and apply a third-party> > > patch would look the same, and dealing with conflicts would work about> > > the same in either case.> > >> > > I would do it the same model as Google: when working locally you can> > > chain patches but when merging you can only ever push a single patch> > > at a time -- and that patch must completely pass tests before it's> > > allowed to be merged. Rebasing is done by cherry picking each of your> > > branch changes one-at-a-time to the top of the trunk (requiring manual> > > resolution for conflicts).> > >> > > Obviously Google does things a _bit_ differently at scale: I think> > > each merge request is batched by the affected files and it> > > concurrently merges multiple changes when there are zero path> > > conflicts (each is still assigned a unique id). Something like that.> > > Regardless, the mental model is the same and at small scale with fast> > > tests the remote branch could LITERALLY test+merge a single patch at a> > > time.> > >> > > Regarding "boil commits down into a single commit", that's pretty much> > > what https://lua.civboot.org#flux.collapse does, even for the remote> > > branch. Locally you would have the additional ability to "fold"> > > multiple patches into a single commit or even modify a patch (then> > > cherry pick your later patches on top, what's called "evolve" at> > > Google), since locally nobody is depending on the specific patch ids> > > or their hashes.> > >> > > These are all "just features" though. You can do all sorts of nice> > > features for working with local branches, reverting to prior branch> > > states, recovering from failed merges, etc. I want a lot of these> > > features to exist and they should be relatively easy to implement> > > because of the freedom you can have for your local branches -- if/when> > > things go horribly wrong you should always be able to just revert to a> > > timestamped snapshot.> > >> > > Best,> > > Rett> > >> > > On Tue, Feb 11, 2025 at 10:43 AM Virgil Dupras <hsoft@hardcoded.net> wrote:> > > >> > > > On Tue, Feb 11, 2025 at 11:32:50AM -0500, d_m wrote:> > > > > To concur with Rett here, I think there is a totally coherent software> > > > > development model using a tool like Flux which would work at scale and> > > > > still be an improvement over tools like CVS/Subversion where merging> > > > > branches was awful.> > > > >> > > > > I would propose something even simpler: when rebasing a branch onto> > > > > main, you first need to collapse the commits from the branch point> > > > > into a single patch (commit) and then apply that at the end. From the> > > > > main branch point of view merging a branch and apply a third-party> > > > > patch would look the same, and dealing with conflicts would work about> > > > > the same in either case.> > > > >> > > > > It's not clear that being able to do a "zipper merge" of two branches> > > > > is ever necessary in practice. It can be convenient but it complicates> > > > > the mental model, makes conflicts much harder to deal with, can be> > > > > confusing to visualize, and requires more code.> > > > >> > > > > I haven't fully understood all the internals of Flux yet, but I think> > > > > a simple patch-based system that can commit changes, support> > > > > branching, boil commits down into a single commit, and apply commits> > > > > from one branch onto another should be sufficient for almost all> > > > > useful work.> > > > >> > > > > -- d_m> > > >> > > > Well then, you got me all excited there. Go Rett!> > > >> > > > Didn't anyone try anything of that sort before?> > > >> > > > Regards,> > > > Virgil
On Sat, Feb 15, 2025 at 05:29:45AM -0700, Rett Berg wrote:
> So to Virgil's question "has anybody does this" -- these tools (such> as merge) are commonly packaged into the "RCS" package, the docs for> which are here:> > https://www.gnu.org/software/rcs/manual/rcs.html> > Which state:> > > RCS works with versions stored on a single filesystem or machine, edited by one person at a time. Other version control systems, such as Bazaar (http:///www.gnu.org/software/bazaar), CVS, Subversion, and Git, support distributed access in various ways. Which is more appropriate depends on the task at hand.> > After reading through its tutorial I don't think I'm likely to use it> personally.> > Apparently CVS[1] is intended to be an "RCS Frontend". I downloaded a> 5 yr old github mirror[2] and tokei reports that the src/ directory> has 110,000 lines of code with 267,265 lines in the whole thing. I> have not looked into this, but CVS's wikipedia talks about it being a> "server"-- maybe they took that too far and it added bloat?> > RCS never mentions a (smol) spiritual ancestor. Why does software,> when it grows organically, commonly go from something like RCS to> become something like CVS; instead of RCS becoming something like PVC> (my project)? Did something try but then (for some reason) abandon a> simpler approach? Why was there so much adoption of CVS without> alternatives considered first?> > Minor updates:> * I tested using the RCS "merge" command for rebase/cherrypick and it> works -- I believe I can now implement all the features I'm currently> planning.> * I am now using patch --input= instead of passing by stdin. Also I'm> doing diff --unified=0 since context is not necessary for data storage> and merging doesn't use diffs anyway.> * I decided to store patches in a directory hierarchy which allows> only 100 items per level, for instance patch 12,432 might be> .pvc/mybranch/patches/01/24/32/12432.p. This has several benefits:> * old filesystems had trouble with lots of nodes, which DuskOS> _could_ be. 100 is small enough to be manageable.> * it makes it much faster to search for "patches near X".> * you can "upgrade" this directory structure nearly atomically by> just moving the patches/ directory into the next level.> * I actually believe this could scale to be useable for even a> relatively large project -- that's the goal anyway.> > Best> - Rett> > [1]: https://en.wikipedia.org/wiki/Concurrent_Versions_System> [2]: https://github.com/Aalbus-linux/cvs
A big advantage that SVN initially had over CVS was that commit IDs were atomic
to the project tree. From what I read of your design docs, patches stored would
be for the whole tree, right? Not individual files, right? RCS/CVS are not good
inspirations.
As for the patch naming scheme, didn't I read in your design docs that you'd
have a tar to push old patches to? This sounds like a good idea: keep the
latest X patches as files and older ones get stuffed in the tar. Quick access
to patches being "worked on" and convenient archival for the rest. It could be
gzipped.
Regards,
Virgil
Yes, each patch contains diffs for multiple files in the project
directory (aka the "project tree"). Public (aka hosted) patch IDs
should be considered effectively atomic and immutable (though you are
permitted to "collapse" them, per the previous email).
I also really like the tar idea for archival! It allows the
"filesystem as a database" to scale even as the number of patches gets
large, providing folks are typically not checking out old patches (and
even if they are... they are free to untar!)
The way I envision it working is that most projects would probably
host something like the following on an FTP server:
* All patches older than a few months go in the tar (which will be in
i.e. .pvc/main/archive/1-12344.tar and an accompanying tar.gz for
hosted download) -- the tar name always specifies which patch ID range
it contains.
* Each patch that is more recent will be i.e.
main/patches/01/23/12345.p (file is in plaintext unidiff format)
* snapshots will be i.e. main/patches/01/23/12345.d (for a directory)
or .../12345.tar.gz (for snapshot archive), or both
* patches won't be stored (in main/patches/) for ids between
archive/1-12344.tar -- but snapshots of tagged patches (aka released
versions) will typically be stored in tar.gz form -- so you may have
main/patches/00/00/99.tar.gz without an accompanying
main/patches/00/00/99.p file, if 99 is a tagged patch.
* core contributors could host their own "named" branches that they
can push/pull/edit to allow faster sharing and reviews, i.e. rett/,
virgil/ etc
Something like that anyway!
On Sat, Feb 15, 2025 at 8:12 AM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> On Sat, Feb 15, 2025 at 05:29:45AM -0700, Rett Berg wrote:> > So to Virgil's question "has anybody does this" -- these tools (such> > as merge) are commonly packaged into the "RCS" package, the docs for> > which are here:> >> > https://www.gnu.org/software/rcs/manual/rcs.html> >> > Which state:> >> > > RCS works with versions stored on a single filesystem or machine, edited by one person at a time. Other version control systems, such as Bazaar (http:///www.gnu.org/software/bazaar), CVS, Subversion, and Git, support distributed access in various ways. Which is more appropriate depends on the task at hand.> >> > After reading through its tutorial I don't think I'm likely to use it> > personally.> >> > Apparently CVS[1] is intended to be an "RCS Frontend". I downloaded a> > 5 yr old github mirror[2] and tokei reports that the src/ directory> > has 110,000 lines of code with 267,265 lines in the whole thing. I> > have not looked into this, but CVS's wikipedia talks about it being a> > "server"-- maybe they took that too far and it added bloat?> >> > RCS never mentions a (smol) spiritual ancestor. Why does software,> > when it grows organically, commonly go from something like RCS to> > become something like CVS; instead of RCS becoming something like PVC> > (my project)? Did something try but then (for some reason) abandon a> > simpler approach? Why was there so much adoption of CVS without> > alternatives considered first?> >> > Minor updates:> > * I tested using the RCS "merge" command for rebase/cherrypick and it> > works -- I believe I can now implement all the features I'm currently> > planning.> > * I am now using patch --input= instead of passing by stdin. Also I'm> > doing diff --unified=0 since context is not necessary for data storage> > and merging doesn't use diffs anyway.> > * I decided to store patches in a directory hierarchy which allows> > only 100 items per level, for instance patch 12,432 might be> > .pvc/mybranch/patches/01/24/32/12432.p. This has several benefits:> > * old filesystems had trouble with lots of nodes, which DuskOS> > _could_ be. 100 is small enough to be manageable.> > * it makes it much faster to search for "patches near X".> > * you can "upgrade" this directory structure nearly atomically by> > just moving the patches/ directory into the next level.> > * I actually believe this could scale to be useable for even a> > relatively large project -- that's the goal anyway.> >> > Best> > - Rett> >> > [1]: https://en.wikipedia.org/wiki/Concurrent_Versions_System> > [2]: https://github.com/Aalbus-linux/cvs>> A big advantage that SVN initially had over CVS was that commit IDs were atomic> to the project tree. From what I read of your design docs, patches stored would> be for the whole tree, right? Not individual files, right? RCS/CVS are not good> inspirations.>> As for the patch naming scheme, didn't I read in your design docs that you'd> have a tar to push old patches to? This sounds like a good idea: keep the> latest X patches as files and older ones get stuffed in the tar. Quick access> to patches being "worked on" and convenient archival for the rest. It could be> gzipped.>> Regards,> Virgil
Does anyone know the answer to this question?
https://stackoverflow.com/questions/79441982/unidiff-encode-file-move
On Sat, Feb 15, 2025 at 10:09 AM Rett Berg <googberg@gmail.com> wrote:
>> Yes, each patch contains diffs for multiple files in the project> directory (aka the "project tree"). Public (aka hosted) patch IDs> should be considered effectively atomic and immutable (though you are> permitted to "collapse" them, per the previous email).>> I also really like the tar idea for archival! It allows the> "filesystem as a database" to scale even as the number of patches gets> large, providing folks are typically not checking out old patches (and> even if they are... they are free to untar!)>> The way I envision it working is that most projects would probably> host something like the following on an FTP server:> * All patches older than a few months go in the tar (which will be in> i.e. .pvc/main/archive/1-12344.tar and an accompanying tar.gz for> hosted download) -- the tar name always specifies which patch ID range> it contains.> * Each patch that is more recent will be i.e.> main/patches/01/23/12345.p (file is in plaintext unidiff format)> * snapshots will be i.e. main/patches/01/23/12345.d (for a directory)> or .../12345.tar.gz (for snapshot archive), or both> * patches won't be stored (in main/patches/) for ids between> archive/1-12344.tar -- but snapshots of tagged patches (aka released> versions) will typically be stored in tar.gz form -- so you may have> main/patches/00/00/99.tar.gz without an accompanying> main/patches/00/00/99.p file, if 99 is a tagged patch.> * core contributors could host their own "named" branches that they> can push/pull/edit to allow faster sharing and reviews, i.e. rett/,> virgil/ etc>> Something like that anyway!>>> On Sat, Feb 15, 2025 at 8:12 AM Virgil Dupras <hsoft@hardcoded.net> wrote:> >> > On Sat, Feb 15, 2025 at 05:29:45AM -0700, Rett Berg wrote:> > > So to Virgil's question "has anybody does this" -- these tools (such> > > as merge) are commonly packaged into the "RCS" package, the docs for> > > which are here:> > >> > > https://www.gnu.org/software/rcs/manual/rcs.html> > >> > > Which state:> > >> > > > RCS works with versions stored on a single filesystem or machine, edited by one person at a time. Other version control systems, such as Bazaar (http:///www.gnu.org/software/bazaar), CVS, Subversion, and Git, support distributed access in various ways. Which is more appropriate depends on the task at hand.> > >> > > After reading through its tutorial I don't think I'm likely to use it> > > personally.> > >> > > Apparently CVS[1] is intended to be an "RCS Frontend". I downloaded a> > > 5 yr old github mirror[2] and tokei reports that the src/ directory> > > has 110,000 lines of code with 267,265 lines in the whole thing. I> > > have not looked into this, but CVS's wikipedia talks about it being a> > > "server"-- maybe they took that too far and it added bloat?> > >> > > RCS never mentions a (smol) spiritual ancestor. Why does software,> > > when it grows organically, commonly go from something like RCS to> > > become something like CVS; instead of RCS becoming something like PVC> > > (my project)? Did something try but then (for some reason) abandon a> > > simpler approach? Why was there so much adoption of CVS without> > > alternatives considered first?> > >> > > Minor updates:> > > * I tested using the RCS "merge" command for rebase/cherrypick and it> > > works -- I believe I can now implement all the features I'm currently> > > planning.> > > * I am now using patch --input= instead of passing by stdin. Also I'm> > > doing diff --unified=0 since context is not necessary for data storage> > > and merging doesn't use diffs anyway.> > > * I decided to store patches in a directory hierarchy which allows> > > only 100 items per level, for instance patch 12,432 might be> > > .pvc/mybranch/patches/01/24/32/12432.p. This has several benefits:> > > * old filesystems had trouble with lots of nodes, which DuskOS> > > _could_ be. 100 is small enough to be manageable.> > > * it makes it much faster to search for "patches near X".> > > * you can "upgrade" this directory structure nearly atomically by> > > just moving the patches/ directory into the next level.> > > * I actually believe this could scale to be useable for even a> > > relatively large project -- that's the goal anyway.> > >> > > Best> > > - Rett> > >> > > [1]: https://en.wikipedia.org/wiki/Concurrent_Versions_System> > > [2]: https://github.com/Aalbus-linux/cvs> >> > A big advantage that SVN initially had over CVS was that commit IDs were atomic> > to the project tree. From what I read of your design docs, patches stored would> > be for the whole tree, right? Not individual files, right? RCS/CVS are not good> > inspirations.> >> > As for the patch naming scheme, didn't I read in your design docs that you'd> > have a tar to push old patches to? This sounds like a good idea: keep the> > latest X patches as files and older ones get stuffed in the tar. Quick access> > to patches being "worked on" and convenient archival for the rest. It could be> > gzipped.> >> > Regards,> > Virgil
If you read the manual page for patch it answers your question.
See the "NOTES FOR PATCH SENDERS" section:
> You can create a file by sending out a diff that compares /dev/null or> an empty file dated the Epoch (1970-01-01 00:00:00 UTC) to the file you> want to create. This only works if the file you want to create doesn't> exist already in the target directory. Conversely, you can remove a> file by sending out a context diff that compares the file to be deleted> with an empty file dated the Epoch. The file will be removed unless> patch is conforming to POSIX and the -E or --remove-empty-files option> is not given. An easy way to generate patches that create and remove> files is to use GNU diff's -N or --new-file option.
Just tested this using `touch -t '197001010000.00'` and it works.
# set up
mkdir before after test
echo hello world > before/a.txt
cp before/a.txt test/a.txt
cp before/a.txt after/b.txt
ls -l test
# produce patch
touch -t '197001010000.00' before/b.txt
touch -t '197001010000.00' after/a.txt
diff -u -r before after > test/my.patch
# apply patch
cd test
patch -p1 < my.patch
ls -l test
On Sat, Feb 15, 2025 at 10:33:11AM -0700, Rett Berg wrote:
> Does anyone know the answer to this question?> > https://stackoverflow.com/questions/79441982/unidiff-encode-file-move> > On Sat, Feb 15, 2025 at 10:09 AM Rett Berg <googberg@gmail.com> wrote:> >> > Yes, each patch contains diffs for multiple files in the project> > directory (aka the "project tree"). Public (aka hosted) patch IDs> > should be considered effectively atomic and immutable (though you are> > permitted to "collapse" them, per the previous email).> >> > I also really like the tar idea for archival! It allows the> > "filesystem as a database" to scale even as the number of patches gets> > large, providing folks are typically not checking out old patches (and> > even if they are... they are free to untar!)> >> > The way I envision it working is that most projects would probably> > host something like the following on an FTP server:> > * All patches older than a few months go in the tar (which will be in> > i.e. .pvc/main/archive/1-12344.tar and an accompanying tar.gz for> > hosted download) -- the tar name always specifies which patch ID range> > it contains.> > * Each patch that is more recent will be i.e.> > main/patches/01/23/12345.p (file is in plaintext unidiff format)> > * snapshots will be i.e. main/patches/01/23/12345.d (for a directory)> > or .../12345.tar.gz (for snapshot archive), or both> > * patches won't be stored (in main/patches/) for ids between> > archive/1-12344.tar -- but snapshots of tagged patches (aka released> > versions) will typically be stored in tar.gz form -- so you may have> > main/patches/00/00/99.tar.gz without an accompanying> > main/patches/00/00/99.p file, if 99 is a tagged patch.> > * core contributors could host their own "named" branches that they> > can push/pull/edit to allow faster sharing and reviews, i.e. rett/,> > virgil/ etc> >> > Something like that anyway!> >> >> > On Sat, Feb 15, 2025 at 8:12 AM Virgil Dupras <hsoft@hardcoded.net> wrote:> > >> > > On Sat, Feb 15, 2025 at 05:29:45AM -0700, Rett Berg wrote:> > > > So to Virgil's question "has anybody does this" -- these tools (such> > > > as merge) are commonly packaged into the "RCS" package, the docs for> > > > which are here:> > > >> > > > https://www.gnu.org/software/rcs/manual/rcs.html> > > >> > > > Which state:> > > >> > > > > RCS works with versions stored on a single filesystem or machine, edited by one person at a time. Other version control systems, such as Bazaar (http:///www.gnu.org/software/bazaar), CVS, Subversion, and Git, support distributed access in various ways. Which is more appropriate depends on the task at hand.> > > >> > > > After reading through its tutorial I don't think I'm likely to use it> > > > personally.> > > >> > > > Apparently CVS[1] is intended to be an "RCS Frontend". I downloaded a> > > > 5 yr old github mirror[2] and tokei reports that the src/ directory> > > > has 110,000 lines of code with 267,265 lines in the whole thing. I> > > > have not looked into this, but CVS's wikipedia talks about it being a> > > > "server"-- maybe they took that too far and it added bloat?> > > >> > > > RCS never mentions a (smol) spiritual ancestor. Why does software,> > > > when it grows organically, commonly go from something like RCS to> > > > become something like CVS; instead of RCS becoming something like PVC> > > > (my project)? Did something try but then (for some reason) abandon a> > > > simpler approach? Why was there so much adoption of CVS without> > > > alternatives considered first?> > > >> > > > Minor updates:> > > > * I tested using the RCS "merge" command for rebase/cherrypick and it> > > > works -- I believe I can now implement all the features I'm currently> > > > planning.> > > > * I am now using patch --input= instead of passing by stdin. Also I'm> > > > doing diff --unified=0 since context is not necessary for data storage> > > > and merging doesn't use diffs anyway.> > > > * I decided to store patches in a directory hierarchy which allows> > > > only 100 items per level, for instance patch 12,432 might be> > > > .pvc/mybranch/patches/01/24/32/12432.p. This has several benefits:> > > > * old filesystems had trouble with lots of nodes, which DuskOS> > > > _could_ be. 100 is small enough to be manageable.> > > > * it makes it much faster to search for "patches near X".> > > > * you can "upgrade" this directory structure nearly atomically by> > > > just moving the patches/ directory into the next level.> > > > * I actually believe this could scale to be useable for even a> > > > relatively large project -- that's the goal anyway.> > > >> > > > Best> > > > - Rett> > > >> > > > [1]: https://en.wikipedia.org/wiki/Concurrent_Versions_System> > > > [2]: https://github.com/Aalbus-linux/cvs> > >> > > A big advantage that SVN initially had over CVS was that commit IDs were atomic> > > to the project tree. From what I read of your design docs, patches stored would> > > be for the whole tree, right? Not individual files, right? RCS/CVS are not good> > > inspirations.> > >> > > As for the patch naming scheme, didn't I read in your design docs that you'd> > > have a tar to push old patches to? This sounds like a good idea: keep the> > > latest X patches as files and older ones get stuffed in the tar. Quick access> > > to patches being "worked on" and convenient archival for the rest. It could be> > > gzipped.> > >> > > Regards,> > > Virgil
When you came with your flux/pvc proposal, I hadn't thought at all about this
subject. But now that you came with this proposal, I end up thinking about it.
One question I ask myself is: how will I publish Dusk OS under this new system?
A whole protocol like git clone is overkill, but so it setting up rsync over
a raw PVC directory.
So I'm thinking: only releases are important, patches are there for historical
purposes once a release has been made.
So I'm imagining a git-less Dusk and I think there would be, for example,
duskos-v17.tar.gz, which contains the v17 release, then
duskos-v17-patches.tar.gz which contains a series of numbered patch files
starting from 1. While v18 is being worked on, "duskos-v17-patches.tar.gz" is
"hot", that is, it changes as I publish patches. Once v18 is released, that
file doesn't change anymore and you're supposed to be able to reconstruct
duskos-v18.tar.gz only from duskos-v17.tar.gz and duskos-v17-patches.tar.gz.
So I'm thinking, why need anything other than "git am style" patch files at
all? This format already contains a commit message as a prelude. My targz
files are already signed, no need for a separate integrity check.
The time for archival into tar is dictated by releases rather than some
time-based heuristic. Dusk has 100-200 commits per releases, no FS node id
problems there.
People who submit patches would mention against which patch number the patch is
against, and that's it, I would just include the patch as-is in the patch
folder.
All that would be needed would be a collection of scripts to navigate a series
of patches (checkout any state), rebase incoming patches, manage numbering and
all that. And... that would be it?
Maybe I'm being naive :)
Regards,
Virgil
Thanks d_m,
I actually figured your solution was an option but I didn't like it.
Running `cat test/my.patch` on your solution gives:
diff --color -u -r before/a.txt after/a.txt
--- before/a.txt 2025-02-15 12:44:52.942658244 -0700
+++ after/a.txt 1970-01-01 00:00:00.000000000 -0700
@@ -1 +0,0 @@
-hello world
diff --color -u -r before/b.txt after/b.txt
--- before/b.txt 1970-01-01 00:00:00.000000000 -0700
+++ after/b.txt 2025-02-15 12:44:52.950658191 -0700
@@ -0,0 +1 @@
+hello world
Thus this method requires a delete+create for EVERY "move" -- each
"move" will cause the entire file's contents to be in the patch
file... twice!
I suppose this is a cost of using the standard tooling... But it's
unfortunate. Maybe I could require a simple "patchmv" script run on
PVC diff files after the diff is complete which performs the specified
renames (based on +++ and ---). I'll have to consider it.
What I _don't_ want is super expensive moves. Moving a directory
shouldn't cause a patch file to contain the entire directory tree's
contents.
@Virgil I can't tell if your email is a counter proposal or excitement
about PVC's architecture? I see PVC as effectively just, as you say,
"a collection of scripts to navigate a series of patches (checkout any
state), rebase incoming patches, manage numbering and all that." The
only added complexity is in converting 12345 into 01/23/12345.p so you
_could_ scale if you needed to which... isn't very much complexity
lol.
Some "frontends" might implement things like displaying branches or
viewing history-of-a-file/etc -- but those are for comfort, and
wouldn't be required for the core software. Heck, the storage
architecture is simple enough you could probably apply the patches
_manually_ if you really wanted to.
Yes, you could decide on when you wanted to tar the archive (or never
tar), I was suggesting "every couple months" as one possibility -- but
every release probably makes sense as well.
- Rett
On Sat, Feb 15, 2025 at 12:51 PM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> When you came with your flux/pvc proposal, I hadn't thought at all about this> subject. But now that you came with this proposal, I end up thinking about it.>> One question I ask myself is: how will I publish Dusk OS under this new system?> A whole protocol like git clone is overkill, but so it setting up rsync over> a raw PVC directory.>> So I'm thinking: only releases are important, patches are there for historical> purposes once a release has been made.>> So I'm imagining a git-less Dusk and I think there would be, for example,> duskos-v17.tar.gz, which contains the v17 release, then> duskos-v17-patches.tar.gz which contains a series of numbered patch files> starting from 1. While v18 is being worked on, "duskos-v17-patches.tar.gz" is> "hot", that is, it changes as I publish patches. Once v18 is released, that> file doesn't change anymore and you're supposed to be able to reconstruct> duskos-v18.tar.gz only from duskos-v17.tar.gz and duskos-v17-patches.tar.gz.>> So I'm thinking, why need anything other than "git am style" patch files at> all? This format already contains a commit message as a prelude. My targz> files are already signed, no need for a separate integrity check.>> The time for archival into tar is dictated by releases rather than some> time-based heuristic. Dusk has 100-200 commits per releases, no FS node id> problems there.>> People who submit patches would mention against which patch number the patch is> against, and that's it, I would just include the patch as-is in the patch> folder.>> All that would be needed would be a collection of scripts to navigate a series> of patches (checkout any state), rebase incoming patches, manage numbering and> all that. And... that would be it?>> Maybe I'm being naive :)>> Regards,> Virgil
I think I have the basic design. Patch files will look something like
below and PVC will require a "patchmeta" command be run after the
patch is performed (or before with a -R flag if you are reversing the
patch). These look like a series of bash commands but are actually
structured (and reverse-able) data -- the only supported command is
currently "mv" for solving this problem. There will probably also need
to be a "swap" command to swap two files. Otherwise I can't think of
any required commands.
# This is the "commit message" (we call them "patch message")
#
# It can be any length and is always prefixed by "#". Unix's patch is already
# implemented to ignore everything until the first "---" as "garbage"
#
# these are the commands
! mv story.txt kitty.txt
--- story.txt
+++ (moved to kitty.txt)
@@ -1 +1 @@
-# Story
+# Kitty
Note: will also require that files don't start+end with the symbols
"(" to avoid confusing the patch program which should be fine.
On Sat, Feb 15, 2025 at 1:03 PM Rett Berg <googberg@gmail.com> wrote:
>> Thanks d_m,>> I actually figured your solution was an option but I didn't like it.> Running `cat test/my.patch` on your solution gives:>> diff --color -u -r before/a.txt after/a.txt> --- before/a.txt 2025-02-15 12:44:52.942658244 -0700> +++ after/a.txt 1970-01-01 00:00:00.000000000 -0700> @@ -1 +0,0 @@> -hello world> diff --color -u -r before/b.txt after/b.txt> --- before/b.txt 1970-01-01 00:00:00.000000000 -0700> +++ after/b.txt 2025-02-15 12:44:52.950658191 -0700> @@ -0,0 +1 @@> +hello world>> Thus this method requires a delete+create for EVERY "move" -- each> "move" will cause the entire file's contents to be in the patch> file... twice!>> I suppose this is a cost of using the standard tooling... But it's> unfortunate. Maybe I could require a simple "patchmv" script run on> PVC diff files after the diff is complete which performs the specified> renames (based on +++ and ---). I'll have to consider it.> What I _don't_ want is super expensive moves. Moving a directory> shouldn't cause a patch file to contain the entire directory tree's> contents.>> @Virgil I can't tell if your email is a counter proposal or excitement> about PVC's architecture? I see PVC as effectively just, as you say,> "a collection of scripts to navigate a series of patches (checkout any> state), rebase incoming patches, manage numbering and all that." The> only added complexity is in converting 12345 into 01/23/12345.p so you> _could_ scale if you needed to which... isn't very much complexity> lol.>> Some "frontends" might implement things like displaying branches or> viewing history-of-a-file/etc -- but those are for comfort, and> wouldn't be required for the core software. Heck, the storage> architecture is simple enough you could probably apply the patches> _manually_ if you really wanted to.>> Yes, you could decide on when you wanted to tar the archive (or never> tar), I was suggesting "every couple months" as one possibility -- but> every release probably makes sense as well.>> - Rett>> On Sat, Feb 15, 2025 at 12:51 PM Virgil Dupras <hsoft@hardcoded.net> wrote:> >> > When you came with your flux/pvc proposal, I hadn't thought at all about this> > subject. But now that you came with this proposal, I end up thinking about it.> >> > One question I ask myself is: how will I publish Dusk OS under this new system?> > A whole protocol like git clone is overkill, but so it setting up rsync over> > a raw PVC directory.> >> > So I'm thinking: only releases are important, patches are there for historical> > purposes once a release has been made.> >> > So I'm imagining a git-less Dusk and I think there would be, for example,> > duskos-v17.tar.gz, which contains the v17 release, then> > duskos-v17-patches.tar.gz which contains a series of numbered patch files> > starting from 1. While v18 is being worked on, "duskos-v17-patches.tar.gz" is> > "hot", that is, it changes as I publish patches. Once v18 is released, that> > file doesn't change anymore and you're supposed to be able to reconstruct> > duskos-v18.tar.gz only from duskos-v17.tar.gz and duskos-v17-patches.tar.gz.> >> > So I'm thinking, why need anything other than "git am style" patch files at> > all? This format already contains a commit message as a prelude. My targz> > files are already signed, no need for a separate integrity check.> >> > The time for archival into tar is dictated by releases rather than some> > time-based heuristic. Dusk has 100-200 commits per releases, no FS node id> > problems there.> >> > People who submit patches would mention against which patch number the patch is> > against, and that's it, I would just include the patch as-is in the patch> > folder.> >> > All that would be needed would be a collection of scripts to navigate a series> > of patches (checkout any state), rebase incoming patches, manage numbering and> > all that. And... that would be it?> >> > Maybe I'm being naive :)> >> > Regards,> > Virgil
On Sat, Feb 15, 2025 at 01:03:03PM -0700, Rett Berg wrote:
> @Virgil I can't tell if your email is a counter proposal or excitement> about PVC's architecture? I see PVC as effectively just, as you say,> "a collection of scripts to navigate a series of patches (checkout any> state), rebase incoming patches, manage numbering and all that." The> only added complexity is in converting 12345 into 01/23/12345.p so you> _could_ scale if you needed to which... isn't very much complexity> lol.> > Some "frontends" might implement things like displaying branches or> viewing history-of-a-file/etc -- but those are for comfort, and> wouldn't be required for the core software. Heck, the storage> architecture is simple enough you could probably apply the patches> _manually_ if you really wanted to.> > Yes, you could decide on when you wanted to tar the archive (or never> tar), I was suggesting "every couple months" as one possibility -- but> every release probably makes sense as well.
I don't know, I was speccing my needs out loud. The subject of how PVC
repositories were going to be shared hadn't been touched yet. The design specs
mention FTP, but special considerations must be taken for push/pull as you
don't want to spuriously include snapshots in the transfers. It can't be a
naive rsync. And then there's the patches disappearing into the tar that is
going to mess up the synching, leaving orphan files around.
I'm rather getting excited at the possibility of publishing Dusk's development
metadata (patches) in a system-agnostic manner. The consumer of those patches
doesn't need PVC, only whatever tools they deem appropriate for their own
needs. It may be PVC, it may be something else. After all, what is published is
only a list of ordered patches.
And if they really like git, they might even do something like "cat patches/* |
git am" or something of the sort.
That being said, it's possible that I won't resist trying to write a few
wrapper scripts to explore the problem space...
Regards,
Virgil
For pushing I don't think you can get around _some_ kind of RPC
because of security but also ensuring synchronous updates.
For pulling... just frequently package the diffs in a .tar.gz, then
anybody can consume them using wget | tar -x. For those who are using
PVC, the directory structure would be pretty easy to navigate from
FTP: simply check branches/main/tip to see if it's changed. Yes, while
updating we'd probably want some kind of signifier of "locking" for
concurrent readers. However, I think _most_ operations will simply be
atomic: after all, moving a directory or a file will be seen as atomic
by readers. I think we can architect it so that if they WERE accessing
while movement was happening they will get an error and simply need to
throw away their ".pvc/work/" directory and try again.
I'll worry about concurrency in the future since I believe it's a
solvable problem, let's focus on the local implementation with emailed
patches for the time being.
> I'm rather getting excited at the possibility of publishing Dusk's development metadata (patches) in a system-agnostic manner.
I'm getting excited too. I've been working on this for a long time,
constantly trying to simplify simplify simplify. After continuously
removing supposedly necessary pieces I suddenly realized I didn't need
hardly anything -- I can "just do it" using relatively simple tools.
Everything just feels so right and I'm super excited to get it built
:D
- Rett
On Sat, Feb 15, 2025 at 2:31 PM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> On Sat, Feb 15, 2025 at 01:03:03PM -0700, Rett Berg wrote:> > @Virgil I can't tell if your email is a counter proposal or excitement> > about PVC's architecture? I see PVC as effectively just, as you say,> > "a collection of scripts to navigate a series of patches (checkout any> > state), rebase incoming patches, manage numbering and all that." The> > only added complexity is in converting 12345 into 01/23/12345.p so you> > _could_ scale if you needed to which... isn't very much complexity> > lol.> >> > Some "frontends" might implement things like displaying branches or> > viewing history-of-a-file/etc -- but those are for comfort, and> > wouldn't be required for the core software. Heck, the storage> > architecture is simple enough you could probably apply the patches> > _manually_ if you really wanted to.> >> > Yes, you could decide on when you wanted to tar the archive (or never> > tar), I was suggesting "every couple months" as one possibility -- but> > every release probably makes sense as well.>> I don't know, I was speccing my needs out loud. The subject of how PVC> repositories were going to be shared hadn't been touched yet. The design specs> mention FTP, but special considerations must be taken for push/pull as you> don't want to spuriously include snapshots in the transfers. It can't be a> naive rsync. And then there's the patches disappearing into the tar that is> going to mess up the synching, leaving orphan files around.>> I'm rather getting excited at the possibility of publishing Dusk's development> metadata (patches) in a system-agnostic manner. The consumer of those patches> doesn't need PVC, only whatever tools they deem appropriate for their own> needs. It may be PVC, it may be something else. After all, what is published is> only a list of ordered patches.>> And if they really like git, they might even do something like "cat patches/* |> git am" or something of the sort.>> That being said, it's possible that I won't resist trying to write a few> wrapper scripts to explore the problem space...>> Regards,> Virgil
Note for pushing requiring a server: you don't need anything if you
are the only person pushing and you are SSH-ing / whatever onto the
server. You could even give others access to their own branches and
have some mechanism for them to signal the branch is ready for merge,
upon which you run a little script or something to do the test and
merge it.
Scaling for more engineers would require some kind of server though --
but I don't think that's relevant for DuskOS or even Civboot -- or
really for most projects/libraries if I'm being honest.
Best,
Rett
On Sat, Feb 15, 2025 at 2:46 PM Rett Berg <googberg@gmail.com> wrote:
>> For pushing I don't think you can get around _some_ kind of RPC> because of security but also ensuring synchronous updates.>> For pulling... just frequently package the diffs in a .tar.gz, then> anybody can consume them using wget | tar -x. For those who are using> PVC, the directory structure would be pretty easy to navigate from> FTP: simply check branches/main/tip to see if it's changed. Yes, while> updating we'd probably want some kind of signifier of "locking" for> concurrent readers. However, I think _most_ operations will simply be> atomic: after all, moving a directory or a file will be seen as atomic> by readers. I think we can architect it so that if they WERE accessing> while movement was happening they will get an error and simply need to> throw away their ".pvc/work/" directory and try again.>> I'll worry about concurrency in the future since I believe it's a> solvable problem, let's focus on the local implementation with emailed> patches for the time being.>> > I'm rather getting excited at the possibility of publishing Dusk's development metadata (patches) in a system-agnostic manner.>> I'm getting excited too. I've been working on this for a long time,> constantly trying to simplify simplify simplify. After continuously> removing supposedly necessary pieces I suddenly realized I didn't need> hardly anything -- I can "just do it" using relatively simple tools.> Everything just feels so right and I'm super excited to get it built> :D>> - Rett>> On Sat, Feb 15, 2025 at 2:31 PM Virgil Dupras <hsoft@hardcoded.net> wrote:> >> > On Sat, Feb 15, 2025 at 01:03:03PM -0700, Rett Berg wrote:> > > @Virgil I can't tell if your email is a counter proposal or excitement> > > about PVC's architecture? I see PVC as effectively just, as you say,> > > "a collection of scripts to navigate a series of patches (checkout any> > > state), rebase incoming patches, manage numbering and all that." The> > > only added complexity is in converting 12345 into 01/23/12345.p so you> > > _could_ scale if you needed to which... isn't very much complexity> > > lol.> > >> > > Some "frontends" might implement things like displaying branches or> > > viewing history-of-a-file/etc -- but those are for comfort, and> > > wouldn't be required for the core software. Heck, the storage> > > architecture is simple enough you could probably apply the patches> > > _manually_ if you really wanted to.> > >> > > Yes, you could decide on when you wanted to tar the archive (or never> > > tar), I was suggesting "every couple months" as one possibility -- but> > > every release probably makes sense as well.> >> > I don't know, I was speccing my needs out loud. The subject of how PVC> > repositories were going to be shared hadn't been touched yet. The design specs> > mention FTP, but special considerations must be taken for push/pull as you> > don't want to spuriously include snapshots in the transfers. It can't be a> > naive rsync. And then there's the patches disappearing into the tar that is> > going to mess up the synching, leaving orphan files around.> >> > I'm rather getting excited at the possibility of publishing Dusk's development> > metadata (patches) in a system-agnostic manner. The consumer of those patches> > doesn't need PVC, only whatever tools they deem appropriate for their own> > needs. It may be PVC, it may be something else. After all, what is published is> > only a list of ordered patches.> >> > And if they really like git, they might even do something like "cat patches/* |> > git am" or something of the sort.> >> > That being said, it's possible that I won't resist trying to write a few> > wrapper scripts to explore the problem space...> >> > Regards,> > Virgil
<unlurk>
Hi Rett,
If you’ll indulge me a little, I’d like to drop a couple of leading questions:
1. Almost always, when getting a local copy of some software either to
build or contribute to by making patches, I’ll want to start at the latest
available revision (the tip of a branch). In order to optimize for the common
case, wouldn’t it be better for both users and CPUs to always snapshot
the most recent revision, and store a tarball and/or directory of reverse
patches against that for the much less common use case of making a
working copy to match older revisions?
2. Almost always, multiple contributors that build their patches from the
tip of a development branch will have conflicts if everyone is starting from
the same monotonically increasing patch number, and will have to resolve
patch number conflicts when applying upstream changes to bring their
working copy up to date in order to prepare a patch submission rebased
against a common upstream. Isn’t a UUID for each patch, and a reference
to the original parent a better mechanism for allowing multiple contributors
to collaborate, and fork, and equally for upstream to rebase onto their own
copy of the repository?
I recommend a read of Building Git (https://shop.jcoglan.com/building-git/)
to understand some of the underlying reasons for design decisions in the
file system “database” that backs git, even if you too find the UX of the git
command line to look like it was designed by a committee of drunken
monkeys ;-)
Cheers,
Gary
</unlurk>
> On Feb 15, 2025, at 2:49 PM, Rett Berg <googberg@gmail.com> wrote:> > Note for pushing requiring a server: you don't need anything if you> are the only person pushing and you are SSH-ing / whatever onto the> server. You could even give others access to their own branches and> have some mechanism for them to signal the branch is ready for merge,> upon which you run a little script or something to do the test and> merge it.> > Scaling for more engineers would require some kind of server though --> but I don't think that's relevant for DuskOS or even Civboot -- or> really for most projects/libraries if I'm being honest.> > Best,> Rett> > On Sat, Feb 15, 2025 at 2:46 PM Rett Berg <googberg@gmail.com> wrote:>> >> For pushing I don't think you can get around _some_ kind of RPC>> because of security but also ensuring synchronous updates.>> >> For pulling... just frequently package the diffs in a .tar.gz, then>> anybody can consume them using wget | tar -x. For those who are using>> PVC, the directory structure would be pretty easy to navigate from>> FTP: simply check branches/main/tip to see if it's changed. Yes, while>> updating we'd probably want some kind of signifier of "locking" for>> concurrent readers. However, I think _most_ operations will simply be>> atomic: after all, moving a directory or a file will be seen as atomic>> by readers. I think we can architect it so that if they WERE accessing>> while movement was happening they will get an error and simply need to>> throw away their ".pvc/work/" directory and try again.>> >> I'll worry about concurrency in the future since I believe it's a>> solvable problem, let's focus on the local implementation with emailed>> patches for the time being.>> >>> I'm rather getting excited at the possibility of publishing Dusk's development metadata (patches) in a system-agnostic manner.>> >> I'm getting excited too. I've been working on this for a long time,>> constantly trying to simplify simplify simplify. After continuously>> removing supposedly necessary pieces I suddenly realized I didn't need>> hardly anything -- I can "just do it" using relatively simple tools.>> Everything just feels so right and I'm super excited to get it built>> :D>> >> - Rett>> >> On Sat, Feb 15, 2025 at 2:31 PM Virgil Dupras <hsoft@hardcoded.net> wrote:>>> >>> On Sat, Feb 15, 2025 at 01:03:03PM -0700, Rett Berg wrote:>>>> @Virgil I can't tell if your email is a counter proposal or excitement>>>> about PVC's architecture? I see PVC as effectively just, as you say,>>>> "a collection of scripts to navigate a series of patches (checkout any>>>> state), rebase incoming patches, manage numbering and all that." The>>>> only added complexity is in converting 12345 into 01/23/12345.p so you>>>> _could_ scale if you needed to which... isn't very much complexity>>>> lol.>>>> >>>> Some "frontends" might implement things like displaying branches or>>>> viewing history-of-a-file/etc -- but those are for comfort, and>>>> wouldn't be required for the core software. Heck, the storage>>>> architecture is simple enough you could probably apply the patches>>>> _manually_ if you really wanted to.>>>> >>>> Yes, you could decide on when you wanted to tar the archive (or never>>>> tar), I was suggesting "every couple months" as one possibility -- but>>>> every release probably makes sense as well.>>> >>> I don't know, I was speccing my needs out loud. The subject of how PVC>>> repositories were going to be shared hadn't been touched yet. The design specs>>> mention FTP, but special considerations must be taken for push/pull as you>>> don't want to spuriously include snapshots in the transfers. It can't be a>>> naive rsync. And then there's the patches disappearing into the tar that is>>> going to mess up the synching, leaving orphan files around.>>> >>> I'm rather getting excited at the possibility of publishing Dusk's development>>> metadata (patches) in a system-agnostic manner. The consumer of those patches>>> doesn't need PVC, only whatever tools they deem appropriate for their own>>> needs. It may be PVC, it may be something else. After all, what is published is>>> only a list of ordered patches.>>> >>> And if they really like git, they might even do something like "cat patches/* |>>> git am" or something of the sort.>>> >>> That being said, it's possible that I won't resist trying to write a few>>> wrapper scripts to explore the problem space...>>> >>> Regards,>>> Virgil>