I've personally never enjoyed unix's diff format -- mainly because I
find it *diff*icult to read and even more difficult to parse. I think
it was mainly written to help aid tools at the time to modify files,
but I don't find that helpful.
I wrote a patience diff algorithm and my own diff format. It's
colorized in my own terminal, but I paste it below for comparison
$ cat > file1.txt <<EOF
this is the original text
line2
line3
line4
happy hacking !
EOF
$ cat > file2.txt <<EOF
this is the original text
line2
line3
happy hacking !
GNU is not UNIX
EOF
$ diff file1.txt file2.txt
4d3
< line4
5a5
> GNU is not UNIX
$ diff -u file1.txt file2.txt
--- file1.txt 2025-02-10 16:46:45.357396058 -0700
+++ file2.txt 2025-02-10 16:47:53.753422225 -0700
@@ -1,5 +1,5 @@
this is the original text
line2
line3
-line4
happy hacking !
+GNU is not UNIX
$ civ.lua lines.diff file1.txt file2.txt
file1.txt :: file2.txt
1 1 this is the original text
3 3 line3
4 line4
5 4 happy hacking !
5 GNU is not UNIX
You can read about the unix diffs from
https://unix.stackexchange.com/questions/81998/understanding-of-diff-output.
Mine can be explained VERY briefly:
1. the file names are listed as: left.file :: right.file
2. Each line is composed of the following separated by tabs:
leftLineNum, rightLineNum, lineText
3. The example output includes some unchanged text to provide better
user readability (i.e. lines 1:1, 3:3 and 5:4): these aren't necessary
when storing the diff.
You can see that the second column is empty for line 4 -- this means
it was "removed" (it didn't exist in right). You can also see the
first column is empty at the bottom -- that means it was "added" (aka
it didn't exist in left)
I think this is approximately as efficient as the --rcs option in diff
but MUCH more readable. It also has the benefit of being more readable
in a text editor because it uses tabs.
Thoughts? Do I just need to spend more time with the Unix diff? If so,
does anyone have a good reference on the spec? It would be best if we
could ingest either, but I strongly prefer the before/after version
that I've outlined for both storage and default usage.
I have other opinions on how to output conflict during cherry picks,
but I'll save that for a future thread.
Best,
Rett
On Mon, Feb 10, 2025 at 05:09:52PM -0700, Rett Berg wrote:
> Thoughts? Do I just need to spend more time with the Unix diff? If so,> does anyone have a good reference on the spec? It would be best if we> could ingest either, but I strongly prefer the before/after version> that I've outlined for both storage and default usage.
Without making any other claims I will say that I definitely prefer
the unified diff format over the one outlined.
It is documented pretty well on Wikipedia, among other places:
https://en.wikipedia.org/wiki/Diff#Unified_format
I have independently considered a barebones version control system
based on patches. If I were the implementor I would definitely use the
unified diff version, since I find it readable and it is compatible
with existing diff/patch tools.
-- d_m
I find Berg's format easier to understand. In unified format, the information
of which line is which is separated from the line itself in the @@ header. In
Berg's interpretation it's beside the lines, making it easier to follow.
The emtpy column is a clear enough indication of whether the line is there. I'd
say it is iconic [1]: if there is no number, the line is not in that file. I
like that.
----
[1] Charles Peirce developed a theory of semiotics that I've not studied deeply
but I kinda like. The idea of iconicity is that "the representation looks like
the thing being represented". Unlike the word "banana", the banana emoji 🍌looks
like a banana. In the line representation, an empty space looks like a missing
line, whereas a "minus sign" to represent that "the line is not in the original
file" is an arbitrary symbol.
Peirce also made an iconic notation for first order predicate logic called
Existential Graphs [2]. They are beautiful and weird, the kind of thing Devine
might like. It's *graphical*, it feels like the geometry of logic in a way that
venn diagrams could never compete.
[2] https://en.wikipedia.org/wiki/Existential_graph
Good points, I think you're right. Thanks for the link, that is indeed
what I was looking for
On Mon, Feb 10, 2025 at 5:26 PM d_m <d_m@plastic-idolatry.com> wrote:
>> On Mon, Feb 10, 2025 at 05:09:52PM -0700, Rett Berg wrote:> > Thoughts? Do I just need to spend more time with the Unix diff? If so,> > does anyone have a good reference on the spec? It would be best if we> > could ingest either, but I strongly prefer the before/after version> > that I've outlined for both storage and default usage.>> Without making any other claims I will say that I definitely prefer> the unified diff format over the one outlined.>> It is documented pretty well on Wikipedia, among other places:>> https://en.wikipedia.org/wiki/Diff#Unified_format>> I have independently considered a barebones version control system> based on patches. If I were the implementor I would definitely use the> unified diff version, since I find it readable and it is compatible> with existing diff/patch tools.>> -- d_m
Thanks Polifemo, I'll keep it as an option for VIEWING diffs but make
storage use unidiff for compatibility reasons.
- Rett
On Mon, Feb 10, 2025 at 7:10 PM Polifemo <brunofrancosalamin@gmail.com> wrote:
>>> I find Berg's format easier to understand. In unified format, the information> of which line is which is separated from the line itself in the @@ header. In> Berg's interpretation it's beside the lines, making it easier to follow.>> The emtpy column is a clear enough indication of whether the line is there. I'd> say it is iconic [1]: if there is no number, the line is not in that file. I> like that.>> ---->> [1] Charles Peirce developed a theory of semiotics that I've not studied deeply> but I kinda like. The idea of iconicity is that "the representation looks like> the thing being represented". Unlike the word "banana", the banana emoji 🍌looks> like a banana. In the line representation, an empty space looks like a missing> line, whereas a "minus sign" to represent that "the line is not in the original> file" is an arbitrary symbol.>> Peirce also made an iconic notation for first order predicate logic called> Existential Graphs [2]. They are beautiful and weird, the kind of thing Devine> might like. It's *graphical*, it feels like the geometry of logic in a way that> venn diagrams could never compete.>> [2] https://en.wikipedia.org/wiki/Existential_graph>
On Mon, Feb 10, 2025 at 05:09:52PM -0700, Rett Berg wrote:
> I've personally never enjoyed unix's diff format -- mainly because I> find it *diff*icult to read and even more difficult to parse. I think> it was mainly written to help aid tools at the time to modify files,> but I don't find that helpful.> > I wrote a patience diff algorithm and my own diff format. It's> colorized in my own terminal, but I paste it below for comparison> > $ cat > file1.txt <<EOF> this is the original text> line2> line3> line4> happy hacking !> EOF> > $ cat > file2.txt <<EOF> this is the original text> line2> line3> happy hacking !> GNU is not UNIX> EOF> > $ diff file1.txt file2.txt> 4d3> < line4> 5a5> > GNU is not UNIX> > $ diff -u file1.txt file2.txt> --- file1.txt 2025-02-10 16:46:45.357396058 -0700> +++ file2.txt 2025-02-10 16:47:53.753422225 -0700> @@ -1,5 +1,5 @@> this is the original text> line2> line3> -line4> happy hacking !> +GNU is not UNIX> > $ civ.lua lines.diff file1.txt file2.txt> file1.txt :: file2.txt> 1 1 this is the original text> 3 3 line3> 4 line4> 5 4 happy hacking !> 5 GNU is not UNIX> > You can read about the unix diffs from> https://unix.stackexchange.com/questions/81998/understanding-of-diff-output.> Mine can be explained VERY briefly:> > 1. the file names are listed as: left.file :: right.file> 2. Each line is composed of the following separated by tabs:> leftLineNum, rightLineNum, lineText> 3. The example output includes some unchanged text to provide better> user readability (i.e. lines 1:1, 3:3 and 5:4): these aren't necessary> when storing the diff.> > You can see that the second column is empty for line 4 -- this means> it was "removed" (it didn't exist in right). You can also see the> first column is empty at the bottom -- that means it was "added" (aka> it didn't exist in left)> > I think this is approximately as efficient as the --rcs option in diff> but MUCH more readable. It also has the benefit of being more readable> in a text editor because it uses tabs.> > Thoughts? Do I just need to spend more time with the Unix diff? If so,> does anyone have a good reference on the spec? It would be best if we> could ingest either, but I strongly prefer the before/after version> that I've outlined for both storage and default usage.> > I have other opinions on how to output conflict during cherry picks,> but I'll save that for a future thread.> > Best,> Rett
I don't have a problem with unified diffs and I think compatibility with
existing diff utilities is rather important. Therefore, I'd lean on staying
with unified diffs.
Aside note: if I understand that format proposal correctly, it implies that
viewing a diff requires a screen that is twice as large as editing the file.
Right? On Dusk, I mostly work on 80 to 100 columns grids, so that would be a
problem.
Regards,
Virgil
Sounds good on unidiffs, that's what I'm going with.
> viewing a diff requires a screen that is twice as large as editing the file.
No, it requires an extra two tabwidths (or both line numbers plus a space).
It may truncate some lines in your case but it's much nicer when you
have lots of column real estate.
On Tue, Feb 11, 2025 at 5:25 AM Virgil Dupras <hsoft@hardcoded.net> wrote:
>> On Mon, Feb 10, 2025 at 05:09:52PM -0700, Rett Berg wrote:> > I've personally never enjoyed unix's diff format -- mainly because I> > find it *diff*icult to read and even more difficult to parse. I think> > it was mainly written to help aid tools at the time to modify files,> > but I don't find that helpful.> >> > I wrote a patience diff algorithm and my own diff format. It's> > colorized in my own terminal, but I paste it below for comparison> >> > $ cat > file1.txt <<EOF> > this is the original text> > line2> > line3> > line4> > happy hacking !> > EOF> >> > $ cat > file2.txt <<EOF> > this is the original text> > line2> > line3> > happy hacking !> > GNU is not UNIX> > EOF> >> > $ diff file1.txt file2.txt> > 4d3> > < line4> > 5a5> > > GNU is not UNIX> >> > $ diff -u file1.txt file2.txt> > --- file1.txt 2025-02-10 16:46:45.357396058 -0700> > +++ file2.txt 2025-02-10 16:47:53.753422225 -0700> > @@ -1,5 +1,5 @@> > this is the original text> > line2> > line3> > -line4> > happy hacking !> > +GNU is not UNIX> >> > $ civ.lua lines.diff file1.txt file2.txt> > file1.txt :: file2.txt> > 1 1 this is the original text> > 3 3 line3> > 4 line4> > 5 4 happy hacking !> > 5 GNU is not UNIX> >> > You can read about the unix diffs from> > https://unix.stackexchange.com/questions/81998/understanding-of-diff-output.> > Mine can be explained VERY briefly:> >> > 1. the file names are listed as: left.file :: right.file> > 2. Each line is composed of the following separated by tabs:> > leftLineNum, rightLineNum, lineText> > 3. The example output includes some unchanged text to provide better> > user readability (i.e. lines 1:1, 3:3 and 5:4): these aren't necessary> > when storing the diff.> >> > You can see that the second column is empty for line 4 -- this means> > it was "removed" (it didn't exist in right). You can also see the> > first column is empty at the bottom -- that means it was "added" (aka> > it didn't exist in left)> >> > I think this is approximately as efficient as the --rcs option in diff> > but MUCH more readable. It also has the benefit of being more readable> > in a text editor because it uses tabs.> >> > Thoughts? Do I just need to spend more time with the Unix diff? If so,> > does anyone have a good reference on the spec? It would be best if we> > could ingest either, but I strongly prefer the before/after version> > that I've outlined for both storage and default usage.> >> > I have other opinions on how to output conflict during cherry picks,> > but I'll save that for a future thread.> >> > Best,> > Rett>> I don't have a problem with unified diffs and I think compatibility with> existing diff utilities is rather important. Therefore, I'd lean on staying> with unified diffs.>> Aside note: if I understand that format proposal correctly, it implies that> viewing a diff requires a screen that is twice as large as editing the file.> Right? On Dusk, I mostly work on 80 to 100 columns grids, so that would be a> problem.>> Regards,> Virgil