~sircmpwn/forgeperf-demo

This thread contains a patchset. You're looking at the original emails, but you may wish to use the patch review UI. Review patch
24 4

[PATCH 0/3] improve git-diff documentation and A...B handling

Chris Torek via GitGitGadget
Details
Message ID
<pull.804.git.git.1591661021.gitgitgadget@gmail.com>
DKIM signature
missing
Download raw message
git diff -h help is succinct, but perhaps too much so.

The symmetric-diff syntax, git diff A...B, is defined by the documentation
to compare the merge base of A and B to commit B. It does so just fine when
there is a merge base. It compares A and B directly if there is no merge
base, and it is overly forgiving of bad arguments after which it can produce
nonsensical diffs.

The first patch simply adjusts a test that will fail if the second patch is
accepted. The second patch adds special handling for the symmetric diff
syntax so that the option parsing works, plus a small test suite. The third
patch just updates the SYNOPSIS section of the documentation and makes the
help output more verbose (to match the SYNOPSIS and provide common diff
options like git-diff-files, for instance).

Chris Torek (3):
  t/t3430: avoid undocumented git diff behavior
  git diff: improve A...B merge-base handling
  Documentation: tweak git diff help slightly

 Documentation/git-diff.txt |   2 +
 builtin/diff.c             | 138 ++++++++++++++++++++++++++++++++-----
 t/t3430-rebase-merges.sh   |   2 +-
 t/t4068-diff-symmetric.sh  |  81 ++++++++++++++++++++++
 4 files changed, 206 insertions(+), 17 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh


base-commit: 20514004ddf1a3528de8933bc32f284e175e1012
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-804%2Fchris3torek%2Fcleanup-diff-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-804/chris3torek/cleanup-diff-v1
Pull-Request: https://github.com/git/git/pull/804
-- 
gitgitgadget

[PATCH 1/3] t/t3430: avoid undocumented git diff behavior

Chris Torek via GitGitGadget
Details
Message ID
<414163bbc3cbdda241bedc7bc4dfb8b493071dcb.1591661021.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.git.git.1591661021.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +1 -1
From: Chris Torek <chris.torek@gmail.com>

According to the documentation, "git diff" takes at most two commit-ish,
or an A..B style range, or an A...B style symmetric difference range.
The autosquash-and-exec test relied on "git diff HEAD^!", which works
fine for ordinary commits as the revision parse produces two commit-ish,
namely ^HEAD^ and HEAD.

For merge commits, however, this test makes use of an undocumented
feature: the resulting revision parse has all the parents as UNINTERESTING
followed by the HEAD commit.  This looks identical to a symmetric
diff parse, which lists the merge bases as UNINTERESTING, followed by
the A (UNINTERESTING) and B revs.  So the diff winds up treating it
as one, using the first oid (i.e., HEAD^) and the last (i.e., HEAD).
The documentation, however, says nothing about this usage.

Since diff actually just uses HEAD^ and HEAD, call for these directly
here.  That makes it possible to improve the diff code's handling of
symmetric difference arguments.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
t/t3430-rebase-merges.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index a1bc3e20016..b454f400ebd 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -420,7 +420,7 @@ test_expect_success 'with --autosquash and --exec' '
	git commit --fixup B B.t &&
	write_script show.sh <<-\EOF &&
	subject="$(git show -s --format=%s HEAD)"
	content="$(git diff HEAD^! | tail -n 1)"
	content="$(git diff HEAD^ HEAD | tail -n 1)"
	echo "$subject: $content"
	EOF
	test_tick &&
-- 
gitgitgadget

[PATCH 2/3] git diff: improve A...B merge-base handling

Chris Torek via GitGitGadget
Details
Message ID
<f7c8f094e02406a7d0cb0c61f880e5b01fa413c4.1591661021.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.git.git.1591661021.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +195 -15
From: Chris Torek <chris.torek@gmail.com>

When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".

To avoid all this, add a routine to catch the A...B case and verify that
there is at least one merge base, and that the arguments make sense.
As a side effect, produce a warning showing *which* merge base is being
used when there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 builtin/diff.c            | 129 +++++++++++++++++++++++++++++++++-----
 t/t4068-diff-symmetric.sh |  81 ++++++++++++++++++++++++
 2 files changed, 195 insertions(+), 15 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh

diff --git a/builtin/diff.c b/builtin/diff.c
index 8537b17bd5e..8b8b95ec97e 100644
--- a/builtin/diff.c
@@ -6,6 +6,7 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "cache.h"
#include "config.h"
#include "ewah/ewok.h"
#include "lockfile.h"
#include "color.h"
#include "commit.h"
@@ -254,6 +255,103 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
	return run_diff_files(revs, options);
}

struct symdiff {
	struct bitmap *skip;	/* bitmap of commit indices to skip, or NULL */
	int warn;		/* true if there were multiple merge bases */
	int base, left, right;	/* index of chosen merge base and left&right */
};

/*
 * Check for symmetric-difference arguments, and if present, arrange
 * everything we need to know to handle them correctly.
 *
 * For an actual symmetric diff, *symdiff is set this way:
 *
 *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
 *    indices that the caller should ignore (extra merge bases, of
 *    which there might be many, and A in A...B).  Note that the
 *    chosen merge base and right side are NOT marked.
 *  - warn is set if there are multiple merge bases.
 *  - base, left, and right hold the merge base and left and
 *    right side indices, for warnings or errors.
 *
 * If there is no symmetric diff argument, sym->skip is NULL and
 * sym->warn is cleared.  The remaining fields are not set.
 *
 * If the user provides a symmetric diff with no merge base, or
 * more than one range, we do a usage-exit.
 */
static void builtin_diff_symdiff(struct rev_info *rev, struct symdiff *sym)
{
	int i, lcount = 0, rcount = 0, basecount = 0;
	int lpos = -1, rpos = -1, basepos = -1;
	struct bitmap *map = NULL;

	/*
	 * Use the whence fields to find merge bases and left and
	 * right parts of symmetric difference, so that we do not
	 * depend on the order that revisions are parsed.  If there
	 * are any revs that aren't from these sources, we have a
	 * "git diff C A...B" or "git diff A...B C" case.  Or we
	 * could even get "git diff A...B C...E", for instance.
	 *
	 * If we don't have just one merge base, we pick one
	 * at random.
	 *
	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
	 * rev->pending.objects and rev->cmdline.rev are parallel.
	 */
	for (i = 0; i < rev->cmdline.nr; i++) {
		struct object *obj = rev->pending.objects[i].item;
		switch (rev->cmdline.rev[i].whence) {
		case REV_CMD_MERGE_BASE:
			if (basepos < 0)
				basepos = i;
			basecount++;
			break;		/* do mark all bases */
		case REV_CMD_LEFT:
			if (obj->flags & SYMMETRIC_LEFT) {
				lpos = i;
				lcount++;
				break;	/* do mark A */
			}
			continue;
		case REV_CMD_RIGHT:
			rpos = i;
			rcount++;
			continue;	/* don't mark B */
		default:
			continue;
		}
		if (map == NULL)
			map = bitmap_new();
		bitmap_set(map, i);
	}

	if (lcount == 0) {	/* not a symmetric difference */
		bitmap_free(map);
		sym->warn = 0;
		sym->skip = NULL;
		return;
	}

	if (lcount != 1)
		die(_("cannot use more than one symmetric difference"));

	if (basecount == 0) {
		const char *lname = rev->pending.objects[lpos].name;
		const char *rname = rev->pending.objects[rpos].name;
		die(_("%s...%s: no merge base"), lname, rname);
	}
	bitmap_unset(map, basepos);	/* unmark the base we want */
	sym->base = basepos;
	sym->left = lpos;
	sym->right = rpos;
	sym->warn = basecount > 1;
	sym->skip = map;
}

int cmd_diff(int argc, const char **argv, const char *prefix)
{
	int i;
@@ -263,6 +361,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
	struct object_array_entry *blob[2];
	int nongit = 0, no_index = 0;
	int result = 0;
	struct symdiff sdiff;

	/*
	 * We could get N tree-ish in the rev.pending_objects list.
@@ -382,6 +481,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		}
	}

	builtin_diff_symdiff(&rev, &sdiff);
	for (i = 0; i < rev.pending.nr; i++) {
		struct object_array_entry *entry = &rev.pending.objects[i];
		struct object *obj = entry->item;
@@ -394,8 +494,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
			die(_("invalid object '%s' given."), name);
		if (obj->type == OBJ_COMMIT)
			obj = &get_commit_tree(((struct commit *)obj))->object;

		if (obj->type == OBJ_TREE) {
			if (sdiff.skip && bitmap_get(sdiff.skip, i))
				continue;
			obj->flags |= flags;
			add_object_array(obj, name, &ent);
		} else if (obj->type == OBJ_BLOB) {
@@ -437,24 +538,22 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		usage(builtin_diff_usage);
	else if (ent.nr == 1)
		result = builtin_diff_index(&rev, argc, argv);
	else if (ent.nr == 2)
	else if (ent.nr == 2) {
		if (sdiff.warn) {
			const char *lname = rev.pending.objects[sdiff.left].name;
			const char *rname = rev.pending.objects[sdiff.right].name;
			const char *basename = rev.pending.objects[sdiff.base].name;
			warning(_("%s...%s: multiple merge bases, using %s"),
				lname, rname, basename);
		}
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0], &ent.objects[1]);
	else if (ent.objects[0].item->flags & UNINTERESTING) {
		/*
		 * diff A...B where there is at least one merge base
		 * between A and B.  We have ent.objects[0] ==
		 * merge-base, ent.objects[ents-2] == A, and
		 * ent.objects[ents-1] == B.  Show diff between the
		 * base and B.  Note that we pick one merge base at
		 * random if there are more than one.
		 */
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0],
					   &ent.objects[ent.nr-1]);
	} else
	} else {
		if (sdiff.skip)
			usage(builtin_diff_usage);
		result = builtin_diff_combined(&rev, argc, argv,
					       ent.objects, ent.nr);
	}
	result = diff_result_code(&rev.diffopt, result);
	if (1 < rev.diffopt.skip_stat_unmatch)
		refresh_index_quietly();
diff --git a/t/t4068-diff-symmetric.sh b/t/t4068-diff-symmetric.sh
new file mode 100755
index 00000000000..7b5988933da
--- /dev/null
+++ b/t/t4068-diff-symmetric.sh
@@ -0,0 +1,81 @@
#!/bin/sh

test_description='behavior of diff with symmetric-diff setups'

. ./test-lib.sh

# build these situations:
#  - normal merge with one merge base (b1...b2);
#  - criss-cross merge ie 2 merge bases (b1...master);
#  - disjoint subgraph (orphan branch, b3...master).
#
#     B---E   <-- master
#    / \ /
#   A   X
#    \ / \
#     C---D--G   <-- br1
#      \    /
#       ---F   <-- br2
#
#  H  <-- br3
#
# We put files into a few commits so that we can verify the
# output as well.

test_expect_success setup '
	git commit --allow-empty -m A &&
	echo b >b &&
	git add b &&
	git commit -m B &&
	git checkout -b br1 HEAD^ &&
	echo c >c &&
	git add c &&
	git commit -m C &&
	git tag commit-C &&
	git merge -m D master &&
	git tag commit-D &&
	git checkout master &&
	git merge -m E commit-C &&
	git checkout -b br2 commit-C &&
	echo f >f &&
	git add f &&
	git commit -m F &&
	git checkout br1 &&
	git merge -m G br2 &&
	git checkout --orphan br3 &&
	git commit -m H
'

test_expect_success 'diff with one merge base' '
	git diff commit-D...br1 >tmp &&
	tail -1 tmp >actual &&
	echo +f >expect &&
	test_cmp expect actual
'

# The output (in tmp) can have +b or +c depending
# on which merge base (commit B or C) is picked.
# It should have one of those two, which comes out
# to seven lines.
test_expect_success 'diff with two merge bases' '
	git diff br1...master >tmp 2>err &&
	test_line_count = 7 tmp &&
	test_line_count = 1 err
'

test_expect_success 'diff with no merge bases' '
	test_must_fail git diff br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: br2...br3: no merge base" err
'

test_expect_success 'diff with too many symmetric differences' '
	test_must_fail git diff br1...master br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: cannot use more than one symmetric difference" err
'

test_expect_success 'diff with symmetric difference and extraneous arg' '
	test_must_fail git diff master br1...master >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_done
-- 
gitgitgadget

[PATCH 3/3] Documentation: tweak git diff help slightly

Chris Torek via GitGitGadget
Details
Message ID
<9318365915cfe1898b2942c735d675656ce7b5e5.1591661021.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.git.git.1591661021.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +10 -1
From: Chris Torek <chris.torek@gmail.com>

Update the manual page synopsis to include the two and three
dot notation.

Make "git diff -h" print the same usage summary as the manual
page synopsis.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 Documentation/git-diff.txt | 2 ++
 builtin/diff.c             | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 37781cf1755..c6a201abd72 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -12,6 +12,8 @@ SYNOPSIS
'git diff' [<options>] [<commit>] [--] [<path>...]
'git diff' [<options>] --cached [<commit>] [--] [<path>...]
'git diff' [<options>] <commit> <commit> [--] [<path>...]
'git diff' [<options>] <commit>..<commit> [--] [<path>...]
'git diff' [<options>] <commit>...<commit> [--] [<path>...]
'git diff' [<options>] <blob> <blob>
'git diff' [<options>] --no-index [--] <path> <path>

diff --git a/builtin/diff.c b/builtin/diff.c
index 8b8b95ec97e..365f9e9a908 100644
--- a/builtin/diff.c
@@ -24,7 +24,14 @@
#define DIFF_NO_INDEX_IMPLICIT 2

static const char builtin_diff_usage[] =
"git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
"git diff [<options>] [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit> <commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit>..<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <blob> <blob>]\n"
"   or: git diff [<options>] --no-index [--] <path> <path>]\n"
COMMON_DIFF_OPTIONS_HELP;

static const char *blob_path(struct object_array_entry *entry)
{
-- 
gitgitgadget

Re: [PATCH 1/3] t/t3430: avoid undocumented git diff behavior

Junio C Hamano
Details
Message ID
<xmqqr1uoizqc.fsf@gitster.c.googlers.com>
In-Reply-To
<414163bbc3cbdda241bedc7bc4dfb8b493071dcb.1591661021.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Chris Torek <chris.torek@gmail.com>
>
> According to the documentation, "git diff" takes at most two commit-ish,
> or an A..B style range, or an A...B style symmetric difference range.
> The autosquash-and-exec test relied on "git diff HEAD^!", which works
> fine for ordinary commits as the revision parse produces two commit-ish,
> namely ^HEAD^ and HEAD.
>
> For merge commits, however, this test makes use of an undocumented
> feature:

s/undocumented feature/undefined behaviour/;

The show.sh scripts wants to compute the diff against first parent,
and it uses a range notation HEAD^! which happens to mean
HEAD^..HEAD for a single parent commit, but it forgets that the
commit it may get fed could be a merge.  What the code happens to do
when given "git diff ^HEAD^2 HEAD^..HEAD" is undefined behaviour and
does not even ...

> the resulting revision parse has all the parents as UNINTERESTING
> followed by the HEAD commit.  This looks identical to a symmetric
> diff parse, which lists the merge bases as UNINTERESTING, followed by
> the A (UNINTERESTING) and B revs.  So the diff winds up treating it
> as one, using the first oid (i.e., HEAD^) and the last (i.e., HEAD).
> The documentation, however, says nothing about this usage.

...deserve to be explained in a paragraph like this, I would think.

> Since diff actually just uses HEAD^ and HEAD, call for these directly
> here.  That makes it possible to improve the diff code's handling of
> symmetric difference arguments.

Yes, the resulting code expresses the intent much better.


>
> Signed-off-by: Chris Torek <chris.torek@gmail.com>
> ---
>  t/t3430-rebase-merges.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index a1bc3e20016..b454f400ebd 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -420,7 +420,7 @@ test_expect_success 'with --autosquash and --exec' '
>  	git commit --fixup B B.t &&
>  	write_script show.sh <<-\EOF &&
>  	subject="$(git show -s --format=%s HEAD)"
> -	content="$(git diff HEAD^! | tail -n 1)"
> +	content="$(git diff HEAD^ HEAD | tail -n 1)"
>  	echo "$subject: $content"
>  	EOF
>  	test_tick &&

Re: [PATCH 2/3] git diff: improve A...B merge-base handling

Junio C Hamano
Details
Message ID
<xmqqmu5ciyom.fsf@gitster.c.googlers.com>
In-Reply-To
<f7c8f094e02406a7d0cb0c61f880e5b01fa413c4.1591661021.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct symdiff {
> +	struct bitmap *skip;	/* bitmap of commit indices to skip, or NULL */
> +	int warn;		/* true if there were multiple merge bases */
> +	int base, left, right;	/* index of chosen merge base and left&right */
> +};
> +
> +/*
> + * Check for symmetric-difference arguments, and if present, arrange
> + * everything we need to know to handle them correctly.
> + *
> + * For an actual symmetric diff, *symdiff is set this way:
> + *
> + *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
> + *    indices that the caller should ignore (extra merge bases, of
> + *    which there might be many, and A in A...B).  Note that the
> + *    chosen merge base and right side are NOT marked.
> + *  - warn is set if there are multiple merge bases.
> + *  - base, left, and right hold the merge base and left and
> + *    right side indices, for warnings or errors.
> + *
> + * If there is no symmetric diff argument, sym->skip is NULL and
> + * sym->warn is cleared.  The remaining fields are not set.
> + *
> + * If the user provides a symmetric diff with no merge base, or
> + * more than one range, we do a usage-exit.
> + */
> +static void builtin_diff_symdiff(struct rev_info *rev, struct symdiff *sym)

The function name feels quite suboptimal.  At least I thought that
by the time a call to this function returns, we would have already
produced a symmetric diff output from its name, but apparently that
is not what is being done.  Calling it symdiff_prepare() may be a
vast improvement, perhaps.

> +{
> +	int i, lcount = 0, rcount = 0, basecount = 0;
> +	int lpos = -1, rpos = -1, basepos = -1;
> +	struct bitmap *map = NULL;
> +
> +	/*
> +	 * Use the whence fields to find merge bases and left and
> +	 * right parts of symmetric difference, so that we do not
> +	 * depend on the order that revisions are parsed.  If there
> +	 * are any revs that aren't from these sources, we have a
> +	 * "git diff C A...B" or "git diff A...B C" case.  Or we
> +	 * could even get "git diff A...B C...E", for instance.
> +	 *
> +	 * If we don't have just one merge base, we pick one
> +	 * at random.
> +	 *
> +	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> +	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
> +	 * rev->pending.objects and rev->cmdline.rev are parallel.
> +	 */
> +	for (i = 0; i < rev->cmdline.nr; i++) {
> +		struct object *obj = rev->pending.objects[i].item;
> +		switch (rev->cmdline.rev[i].whence) {
> +		case REV_CMD_MERGE_BASE:
> +			if (basepos < 0)
> +				basepos = i;
> +			basecount++;
> +			break;		/* do mark all bases */
> +		case REV_CMD_LEFT:
> +			if (obj->flags & SYMMETRIC_LEFT) {
> +				lpos = i;
> +				lcount++;
> +				break;	/* do mark A */
> +			}
> +			continue;
> +		case REV_CMD_RIGHT:
> +			rpos = i;
> +			rcount++;

Even though, unlike lcount, you allow arbitrary number of rcount,
and rpos uses "the last one wins" semantics.  Can we describe in the
comment above what use case benefits from this looseness (as opposed
to erroring out when rcount is NOT 1, like done for lcount)?

> +			continue;	/* don't mark B */
> +		default:
> +			continue;
> +		}
> +		if (map == NULL)
> +			map = bitmap_new();
> +		bitmap_set(map, i);
> +	}
> +
> +	if (lcount == 0) {	/* not a symmetric difference */
> +		bitmap_free(map);
> +		sym->warn = 0;
> +		sym->skip = NULL;
> +		return;
> +	}
> +
> +	if (lcount != 1)
> +		die(_("cannot use more than one symmetric difference"));
> +
> +	if (basecount == 0) {
> +		const char *lname = rev->pending.objects[lpos].name;
> +		const char *rname = rev->pending.objects[rpos].name;
> +		die(_("%s...%s: no merge base"), lname, rname);
> +	}
> +	bitmap_unset(map, basepos);	/* unmark the base we want */
> +	sym->base = basepos;
> +	sym->left = lpos;
> +	sym->right = rpos;
> +	sym->warn = basecount > 1;
> +	sym->skip = map;
> +}
> +
>  int cmd_diff(int argc, const char **argv, const char *prefix)
>  {
>  	int i;
> @@ -263,6 +361,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  	struct object_array_entry *blob[2];
>  	int nongit = 0, no_index = 0;
>  	int result = 0;
> +	struct symdiff sdiff;
>  
>  	/*
>  	 * We could get N tree-ish in the rev.pending_objects list.
> @@ -382,6 +481,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> +	builtin_diff_symdiff(&rev, &sdiff);
>  	for (i = 0; i < rev.pending.nr; i++) {
>  		struct object_array_entry *entry = &rev.pending.objects[i];
>  		struct object *obj = entry->item;
> @@ -394,8 +494,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  			die(_("invalid object '%s' given."), name);
>  		if (obj->type == OBJ_COMMIT)
>  			obj = &get_commit_tree(((struct commit *)obj))->object;
> -
>  		if (obj->type == OBJ_TREE) {
> +			if (sdiff.skip && bitmap_get(sdiff.skip, i))
> +				continue;
>  			obj->flags |= flags;
>  			add_object_array(obj, name, &ent);
>  		} else if (obj->type == OBJ_BLOB) {
> @@ -437,24 +538,22 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  		usage(builtin_diff_usage);
>  	else if (ent.nr == 1)
>  		result = builtin_diff_index(&rev, argc, argv);
> -	else if (ent.nr == 2)
> +	else if (ent.nr == 2) {
> +		if (sdiff.warn) {
> +			const char *lname = rev.pending.objects[sdiff.left].name;
> +			const char *rname = rev.pending.objects[sdiff.right].name;
> +			const char *basename = rev.pending.objects[sdiff.base].name;
> +			warning(_("%s...%s: multiple merge bases, using %s"),
> +				lname, rname, basename);
> +		}
>  		result = builtin_diff_tree(&rev, argc, argv,
>  					   &ent.objects[0], &ent.objects[1]);
> -	else if (ent.objects[0].item->flags & UNINTERESTING) {
> -		/*
> -		 * diff A...B where there is at least one merge base
> -		 * between A and B.  We have ent.objects[0] ==
> -		 * merge-base, ent.objects[ents-2] == A, and
> -		 * ent.objects[ents-1] == B.  Show diff between the
> -		 * base and B.  Note that we pick one merge base at
> -		 * random if there are more than one.
> -		 */
> -		result = builtin_diff_tree(&rev, argc, argv,
> -					   &ent.objects[0],
> -					   &ent.objects[ent.nr-1]);
> -	} else
> +	} else {
> +		if (sdiff.skip)
> +			usage(builtin_diff_usage);

sdiff.skip being non-NULL means symdiff_prepare() saw one A...B that
produced two ents and the fact that we have more than two ents mean
that the command line gave us other tree-ishes, e.g. "git diff A...B C"
and it is rejected here.  OK.

>  		result = builtin_diff_combined(&rev, argc, argv,
>  					       ent.objects, ent.nr);
> +	}

Re: [PATCH 3/3] Documentation: tweak git diff help slightly

Junio C Hamano
Details
Message ID
<xmqqimg0iyg9.fsf@gitster.c.googlers.com>
In-Reply-To
<9318365915cfe1898b2942c735d675656ce7b5e5.1591661021.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  'git diff' [<options>] [<commit>] [--] [<path>...]
>  'git diff' [<options>] --cached [<commit>] [--] [<path>...]
>  'git diff' [<options>] <commit> <commit> [--] [<path>...]
> +'git diff' [<options>] <commit>..<commit> [--] [<path>...]
> +'git diff' [<options>] <commit>...<commit> [--] [<path>...]
>  'git diff' [<options>] <blob> <blob>
>  'git diff' [<options>] --no-index [--] <path> <path>

We actually are trying to wean users off of saying "diff A..B" which
is a nonsense notation, so I'd rather not to see it added here.
Describing "diff A...B" is a good idea, though.

While we have our attention on this part of the documentation, would
it make sense to also add description on invoking the combined diff
as well?

[PATCH v2 0/3] improve git-diff documentation and A...B handling

Chris Torek via GitGitGadget
Details
Message ID
<pull.804.v2.git.git.1591729224.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.git.git.1591661021.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
git diff -h help is succinct, but perhaps too much so.

The symmetric-diff syntax, git diff A...B, is defined by the documentation
to compare the merge base of A and B to commit B. It does so just fine when
there is a merge base. It compares A and B directly if there is no merge
base, and it is overly forgiving of bad arguments after which it can produce
nonsensical diffs.

The first patch simply adjusts a test that will fail if the second patch is
accepted. The second patch adds special handling for the symmetric diff
syntax so that the option parsing works, plus a small test suite. The third
patch updates the documentation, including adding a section for combined
commits, and makes the help output more verbose (to match the SYNOPSIS and
provide common diff options like git-diff-files, for instance).

Changes since v1: 

 * shortened first commit's message 
 * renamed prepare function 
 * removed A..B syntax from usage (and fixed typo) 
 * added combined diff syntax to main documentation 

Note: I looked into adding special handling for rev^! syntax and
it seems a bit messy. prepare_symdiff() could do this with its
other analysis, and slide the decoded revisions around. Perhaps
better, revision.c could insert the parent refs after the child,
under control of a flag in the diff flags section of a rev_info.

Chris Torek (3):
  t/t3430: avoid undefined git diff behavior
  git diff: improve A...B merge-base handling
  Documentation: tweak git diff help slightly

 Documentation/git-diff.txt |  21 ++++--
 builtin/diff.c             | 137 ++++++++++++++++++++++++++++++++-----
 t/t3430-rebase-merges.sh   |   2 +-
 t/t4068-diff-symmetric.sh  |  81 ++++++++++++++++++++++
 4 files changed, 220 insertions(+), 21 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh


base-commit: 20514004ddf1a3528de8933bc32f284e175e1012
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-804%2Fchris3torek%2Fcleanup-diff-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-804/chris3torek/cleanup-diff-v2
Pull-Request: https://github.com/git/git/pull/804

Range-diff vs v1:

 1:  414163bbc3c ! 1:  2ccaad645ff t/t3430: avoid undocumented git diff behavior
     @@ Metadata
      Author: Chris Torek <chris.torek@gmail.com>
      
       ## Commit message ##
     -    t/t3430: avoid undocumented git diff behavior
     +    t/t3430: avoid undefined git diff behavior
      
     -    According to the documentation, "git diff" takes at most two commit-ish,
     -    or an A..B style range, or an A...B style symmetric difference range.
     -    The autosquash-and-exec test relied on "git diff HEAD^!", which works
     -    fine for ordinary commits as the revision parse produces two commit-ish,
     -    namely ^HEAD^ and HEAD.
     -
     -    For merge commits, however, this test makes use of an undocumented
     -    feature: the resulting revision parse has all the parents as UNINTERESTING
     -    followed by the HEAD commit.  This looks identical to a symmetric
     -    diff parse, which lists the merge bases as UNINTERESTING, followed by
     -    the A (UNINTERESTING) and B revs.  So the diff winds up treating it
     -    as one, using the first oid (i.e., HEAD^) and the last (i.e., HEAD).
     -    The documentation, however, says nothing about this usage.
     -
     -    Since diff actually just uses HEAD^ and HEAD, call for these directly
     -    here.  That makes it possible to improve the diff code's handling of
     -    symmetric difference arguments.
     +    The autosquash-and-exec test used "git diff HEAD^!" to mean
     +    "git diff HEAD^ HEAD".  Use these directly instead of relying
     +    on the undefined but actual-current behavior of "HEAD^!".
      
          Signed-off-by: Chris Torek <chris.torek@gmail.com>
      
 2:  f7c8f094e02 ! 2:  100fa403477 git diff: improve A...B merge-base handling
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      + * If the user provides a symmetric diff with no merge base, or
      + * more than one range, we do a usage-exit.
      + */
     -+static void builtin_diff_symdiff(struct rev_info *rev, struct symdiff *sym)
     ++static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
      +{
      +	int i, lcount = 0, rcount = 0, basecount = 0;
      +	int lpos = -1, rpos = -1, basepos = -1;
     @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
       		}
       	}
       
     -+	builtin_diff_symdiff(&rev, &sdiff);
     ++	symdiff_prepare(&rev, &sdiff);
       	for (i = 0; i < rev.pending.nr; i++) {
       		struct object_array_entry *entry = &rev.pending.objects[i];
       		struct object *obj = entry->item;
 3:  9318365915c ! 3:  b9b4c6f113d Documentation: tweak git diff help slightly
     @@ Metadata
       ## Commit message ##
          Documentation: tweak git diff help slightly
      
     -    Update the manual page synopsis to include the two and three
     -    dot notation.
     +    Update the manual page synopsis to include the three-dot notation
     +    and the combined-diff option
      
          Make "git diff -h" print the same usage summary as the manual
     -    page synopsis.
     +    page synopsis, minus the "A..B" form, which is now discouraged.
     +
     +    Document the usage for producing combined commits.
      
          Signed-off-by: Chris Torek <chris.torek@gmail.com>
      
       ## Documentation/git-diff.txt ##
      @@ Documentation/git-diff.txt: SYNOPSIS
     + [verse]
       'git diff' [<options>] [<commit>] [--] [<path>...]
       'git diff' [<options>] --cached [<commit>] [--] [<path>...]
     - 'git diff' [<options>] <commit> <commit> [--] [<path>...]
     -+'git diff' [<options>] <commit>..<commit> [--] [<path>...]
     +-'git diff' [<options>] <commit> <commit> [--] [<path>...]
     ++'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
      +'git diff' [<options>] <commit>...<commit> [--] [<path>...]
       'git diff' [<options>] <blob> <blob>
       'git diff' [<options>] --no-index [--] <path> <path>
       
     + DESCRIPTION
     + -----------
     + Show changes between the working tree and the index or a tree, changes
     +-between the index and a tree, changes between two trees, changes between
     +-two blob objects, or changes between two files on disk.
     ++between the index and a tree, changes between two trees, changes resulting
     ++from a merge, changes between two blob objects, or changes between two
     ++files on disk.
     + 
     + 'git diff' [<options>] [--] [<path>...]::
     + 
     +@@ Documentation/git-diff.txt: two blob objects, or changes between two files on disk.
     + 	one side is omitted, it will have the same effect as
     + 	using HEAD instead.
     + 
     ++'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]::
     ++
     ++	This form is to view the results of a merge commit.  The first
     ++	listed <commit> must be the merge itself; the remaining two or
     ++	more commits should be its parents.  A convenient way to produce
     ++	the desired set of revisions is to use the {caret}@ suffix, i.e.,
     ++	"git diff master master^@".  This is equivalent to running "git
     ++	show --format=" on the merge commit, e.g., "git show --format=
     ++	master".
     ++
     + 'git diff' [<options>] <commit>\...<commit> [--] [<path>...]::
     + 
     + 	This form is to view the changes on the branch containing
     +@@ Documentation/git-diff.txt: linkgit:git-difftool[1],
     + linkgit:git-log[1],
     + linkgit:gitdiffcore[7],
     + linkgit:git-format-patch[1],
     +-linkgit:git-apply[1]
     ++linkgit:git-apply[1],
     ++linkgit:git-show[1]
     + 
     + GIT
     + ---
      
       ## builtin/diff.c ##
      @@
     @@ builtin/diff.c
      -"git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
      +"git diff [<options>] [<commit>] [--] [<path>...]\n"
      +"   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
     -+"   or: git diff [<options>] <commit> <commit>] [--] [<path>...]\n"
     -+"   or: git diff [<options>] <commit>..<commit>] [--] [<path>...]\n"
     ++"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
      +"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
      +"   or: git diff [<options>] <blob> <blob>]\n"
      +"   or: git diff [<options>] --no-index [--] <path> <path>]\n"

-- 
gitgitgadget

[PATCH v2 1/3] t/t3430: avoid undefined git diff behavior

Chris Torek via GitGitGadget
Details
Message ID
<2ccaad645ff01b786e76dc63210d75da633389a6.1591729224.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v2.git.git.1591729224.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +1 -1
From: Chris Torek <chris.torek@gmail.com>

The autosquash-and-exec test used "git diff HEAD^!" to mean
"git diff HEAD^ HEAD".  Use these directly instead of relying
on the undefined but actual-current behavior of "HEAD^!".

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 t/t3430-rebase-merges.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index a1bc3e20016..b454f400ebd 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -420,7 +420,7 @@ test_expect_success 'with --autosquash and --exec' '
	git commit --fixup B B.t &&
	write_script show.sh <<-\EOF &&
	subject="$(git show -s --format=%s HEAD)"
	content="$(git diff HEAD^! | tail -n 1)"
	content="$(git diff HEAD^ HEAD | tail -n 1)"
	echo "$subject: $content"
	EOF
	test_tick &&
-- 
gitgitgadget

[PATCH v2 2/3] git diff: improve A...B merge-base handling

Chris Torek via GitGitGadget
Details
Message ID
<100fa4034771e58b65cdac2f3dfb48531c07b735.1591729224.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v2.git.git.1591729224.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +195 -15
From: Chris Torek <chris.torek@gmail.com>

When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".

To avoid all this, add a routine to catch the A...B case and verify that
there is at least one merge base, and that the arguments make sense.
As a side effect, produce a warning showing *which* merge base is being
used when there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 builtin/diff.c            | 129 +++++++++++++++++++++++++++++++++-----
 t/t4068-diff-symmetric.sh |  81 ++++++++++++++++++++++++
 2 files changed, 195 insertions(+), 15 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh

diff --git a/builtin/diff.c b/builtin/diff.c
index 8537b17bd5e..0b6e63dbd02 100644
--- a/builtin/diff.c
@@ -6,6 +6,7 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "cache.h"
#include "config.h"
#include "ewah/ewok.h"
#include "lockfile.h"
#include "color.h"
#include "commit.h"
@@ -254,6 +255,103 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
	return run_diff_files(revs, options);
}

struct symdiff {
	struct bitmap *skip;	/* bitmap of commit indices to skip, or NULL */
	int warn;		/* true if there were multiple merge bases */
	int base, left, right;	/* index of chosen merge base and left&right */
};

/*
 * Check for symmetric-difference arguments, and if present, arrange
 * everything we need to know to handle them correctly.
 *
 * For an actual symmetric diff, *symdiff is set this way:
 *
 *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
 *    indices that the caller should ignore (extra merge bases, of
 *    which there might be many, and A in A...B).  Note that the
 *    chosen merge base and right side are NOT marked.
 *  - warn is set if there are multiple merge bases.
 *  - base, left, and right hold the merge base and left and
 *    right side indices, for warnings or errors.
 *
 * If there is no symmetric diff argument, sym->skip is NULL and
 * sym->warn is cleared.  The remaining fields are not set.
 *
 * If the user provides a symmetric diff with no merge base, or
 * more than one range, we do a usage-exit.
 */
static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
{
	int i, lcount = 0, rcount = 0, basecount = 0;
	int lpos = -1, rpos = -1, basepos = -1;
	struct bitmap *map = NULL;

	/*
	 * Use the whence fields to find merge bases and left and
	 * right parts of symmetric difference, so that we do not
	 * depend on the order that revisions are parsed.  If there
	 * are any revs that aren't from these sources, we have a
	 * "git diff C A...B" or "git diff A...B C" case.  Or we
	 * could even get "git diff A...B C...E", for instance.
	 *
	 * If we don't have just one merge base, we pick one
	 * at random.
	 *
	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
	 * rev->pending.objects and rev->cmdline.rev are parallel.
	 */
	for (i = 0; i < rev->cmdline.nr; i++) {
		struct object *obj = rev->pending.objects[i].item;
		switch (rev->cmdline.rev[i].whence) {
		case REV_CMD_MERGE_BASE:
			if (basepos < 0)
				basepos = i;
			basecount++;
			break;		/* do mark all bases */
		case REV_CMD_LEFT:
			if (obj->flags & SYMMETRIC_LEFT) {
				lpos = i;
				lcount++;
				break;	/* do mark A */
			}
			continue;
		case REV_CMD_RIGHT:
			rpos = i;
			rcount++;
			continue;	/* don't mark B */
		default:
			continue;
		}
		if (map == NULL)
			map = bitmap_new();
		bitmap_set(map, i);
	}

	if (lcount == 0) {	/* not a symmetric difference */
		bitmap_free(map);
		sym->warn = 0;
		sym->skip = NULL;
		return;
	}

	if (lcount != 1)
		die(_("cannot use more than one symmetric difference"));

	if (basecount == 0) {
		const char *lname = rev->pending.objects[lpos].name;
		const char *rname = rev->pending.objects[rpos].name;
		die(_("%s...%s: no merge base"), lname, rname);
	}
	bitmap_unset(map, basepos);	/* unmark the base we want */
	sym->base = basepos;
	sym->left = lpos;
	sym->right = rpos;
	sym->warn = basecount > 1;
	sym->skip = map;
}

int cmd_diff(int argc, const char **argv, const char *prefix)
{
	int i;
@@ -263,6 +361,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
	struct object_array_entry *blob[2];
	int nongit = 0, no_index = 0;
	int result = 0;
	struct symdiff sdiff;

	/*
	 * We could get N tree-ish in the rev.pending_objects list.
@@ -382,6 +481,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		}
	}

	symdiff_prepare(&rev, &sdiff);
	for (i = 0; i < rev.pending.nr; i++) {
		struct object_array_entry *entry = &rev.pending.objects[i];
		struct object *obj = entry->item;
@@ -394,8 +494,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
			die(_("invalid object '%s' given."), name);
		if (obj->type == OBJ_COMMIT)
			obj = &get_commit_tree(((struct commit *)obj))->object;

		if (obj->type == OBJ_TREE) {
			if (sdiff.skip && bitmap_get(sdiff.skip, i))
				continue;
			obj->flags |= flags;
			add_object_array(obj, name, &ent);
		} else if (obj->type == OBJ_BLOB) {
@@ -437,24 +538,22 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		usage(builtin_diff_usage);
	else if (ent.nr == 1)
		result = builtin_diff_index(&rev, argc, argv);
	else if (ent.nr == 2)
	else if (ent.nr == 2) {
		if (sdiff.warn) {
			const char *lname = rev.pending.objects[sdiff.left].name;
			const char *rname = rev.pending.objects[sdiff.right].name;
			const char *basename = rev.pending.objects[sdiff.base].name;
			warning(_("%s...%s: multiple merge bases, using %s"),
				lname, rname, basename);
		}
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0], &ent.objects[1]);
	else if (ent.objects[0].item->flags & UNINTERESTING) {
		/*
		 * diff A...B where there is at least one merge base
		 * between A and B.  We have ent.objects[0] ==
		 * merge-base, ent.objects[ents-2] == A, and
		 * ent.objects[ents-1] == B.  Show diff between the
		 * base and B.  Note that we pick one merge base at
		 * random if there are more than one.
		 */
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0],
					   &ent.objects[ent.nr-1]);
	} else
	} else {
		if (sdiff.skip)
			usage(builtin_diff_usage);
		result = builtin_diff_combined(&rev, argc, argv,
					       ent.objects, ent.nr);
	}
	result = diff_result_code(&rev.diffopt, result);
	if (1 < rev.diffopt.skip_stat_unmatch)
		refresh_index_quietly();
diff --git a/t/t4068-diff-symmetric.sh b/t/t4068-diff-symmetric.sh
new file mode 100755
index 00000000000..7b5988933da
--- /dev/null
+++ b/t/t4068-diff-symmetric.sh
@@ -0,0 +1,81 @@
#!/bin/sh

test_description='behavior of diff with symmetric-diff setups'

. ./test-lib.sh

# build these situations:
#  - normal merge with one merge base (b1...b2);
#  - criss-cross merge ie 2 merge bases (b1...master);
#  - disjoint subgraph (orphan branch, b3...master).
#
#     B---E   <-- master
#    / \ /
#   A   X
#    \ / \
#     C---D--G   <-- br1
#      \    /
#       ---F   <-- br2
#
#  H  <-- br3
#
# We put files into a few commits so that we can verify the
# output as well.

test_expect_success setup '
	git commit --allow-empty -m A &&
	echo b >b &&
	git add b &&
	git commit -m B &&
	git checkout -b br1 HEAD^ &&
	echo c >c &&
	git add c &&
	git commit -m C &&
	git tag commit-C &&
	git merge -m D master &&
	git tag commit-D &&
	git checkout master &&
	git merge -m E commit-C &&
	git checkout -b br2 commit-C &&
	echo f >f &&
	git add f &&
	git commit -m F &&
	git checkout br1 &&
	git merge -m G br2 &&
	git checkout --orphan br3 &&
	git commit -m H
'

test_expect_success 'diff with one merge base' '
	git diff commit-D...br1 >tmp &&
	tail -1 tmp >actual &&
	echo +f >expect &&
	test_cmp expect actual
'

# The output (in tmp) can have +b or +c depending
# on which merge base (commit B or C) is picked.
# It should have one of those two, which comes out
# to seven lines.
test_expect_success 'diff with two merge bases' '
	git diff br1...master >tmp 2>err &&
	test_line_count = 7 tmp &&
	test_line_count = 1 err
'

test_expect_success 'diff with no merge bases' '
	test_must_fail git diff br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: br2...br3: no merge base" err
'

test_expect_success 'diff with too many symmetric differences' '
	test_must_fail git diff br1...master br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: cannot use more than one symmetric difference" err
'

test_expect_success 'diff with symmetric difference and extraneous arg' '
	test_must_fail git diff master br1...master >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_done
-- 
gitgitgadget

[PATCH v2 3/3] Documentation: tweak git diff help slightly

Chris Torek via GitGitGadget
Details
Message ID
<b9b4c6f113dfb03268b391cc62abfd38bd7632e4.1591729224.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v2.git.git.1591729224.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +24 -5
From: Chris Torek <chris.torek@gmail.com>

Update the manual page synopsis to include the three-dot notation
and the combined-diff option

Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.

Document the usage for producing combined commits.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 Documentation/git-diff.txt | 21 +++++++++++++++++----
 builtin/diff.c             |  8 +++++++-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 37781cf1755..0bce278652a 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -11,15 +11,17 @@ SYNOPSIS
[verse]
'git diff' [<options>] [<commit>] [--] [<path>...]
'git diff' [<options>] --cached [<commit>] [--] [<path>...]
'git diff' [<options>] <commit> <commit> [--] [<path>...]
'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
'git diff' [<options>] <commit>...<commit> [--] [<path>...]
'git diff' [<options>] <blob> <blob>
'git diff' [<options>] --no-index [--] <path> <path>

DESCRIPTION
-----------
Show changes between the working tree and the index or a tree, changes
between the index and a tree, changes between two trees, changes between
two blob objects, or changes between two files on disk.
between the index and a tree, changes between two trees, changes resulting
from a merge, changes between two blob objects, or changes between two
files on disk.

'git diff' [<options>] [--] [<path>...]::

@@ -67,6 +69,16 @@ two blob objects, or changes between two files on disk.
	one side is omitted, it will have the same effect as
	using HEAD instead.

'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]::

	This form is to view the results of a merge commit.  The first
	listed <commit> must be the merge itself; the remaining two or
	more commits should be its parents.  A convenient way to produce
	the desired set of revisions is to use the {caret}@ suffix, i.e.,
	"git diff master master^@".  This is equivalent to running "git
	show --format=" on the merge commit, e.g., "git show --format=
	master".

'git diff' [<options>] <commit>\...<commit> [--] [<path>...]::

	This form is to view the changes on the branch containing
@@ -196,7 +208,8 @@ linkgit:git-difftool[1],
linkgit:git-log[1],
linkgit:gitdiffcore[7],
linkgit:git-format-patch[1],
linkgit:git-apply[1]
linkgit:git-apply[1],
linkgit:git-show[1]

GIT
---
diff --git a/builtin/diff.c b/builtin/diff.c
index 0b6e63dbd02..b333640082b 100644
--- a/builtin/diff.c
@@ -24,7 +24,13 @@
#define DIFF_NO_INDEX_IMPLICIT 2

static const char builtin_diff_usage[] =
"git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
"git diff [<options>] [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <blob> <blob>]\n"
"   or: git diff [<options>] --no-index [--] <path> <path>]\n"
COMMON_DIFF_OPTIONS_HELP;

static const char *blob_path(struct object_array_entry *entry)
{
-- 
gitgitgadget

Re: [PATCH v2 2/3] git diff: improve A...B merge-base handling

Junio C Hamano
Details
Message ID
<xmqqftb3hmx6.fsf@gitster.c.googlers.com>
In-Reply-To
<100fa4034771e58b65cdac2f3dfb48531c07b735.1591729224.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
> +{
> +	int i, lcount = 0, rcount = 0, basecount = 0;
> +	int lpos = -1, rpos = -1, basepos = -1;
> +	struct bitmap *map = NULL;

The logic around rcount and rpos in this function still smells
fishy.  For example, rcount is counted up from 0 but its value is
never consulted, so we should be able to get rid of it.

For that matter, lcount and lpos look somewhat redundant.  lpos
begins with -1 to signal "have not seen any left end of symmetric
range yet", and we won't allow more than two symmetric ranges
anyway, so we should be able to get rid of lcount and base the error
condition purely on lpos by

 * when we see SYMMETRIC_LEFT, check if lpos is already non-negative,
   and die otherwise right there.  We don't need "if lcount != 1, die"
   after the loop.

 * after the loop, if lpos is still -1, we know we didn't see
   symmetric difference.

> +	/*
> +	 * Use the whence fields to find merge bases and left and
> +	 * right parts of symmetric difference, so that we do not
> +	 * depend on the order that revisions are parsed.  If there
> +	 * are any revs that aren't from these sources, we have a
> +	 * "git diff C A...B" or "git diff A...B C" case.  Or we
> +	 * could even get "git diff A...B C...E", for instance.
> +	 *
> +	 * If we don't have just one merge base, we pick one
> +	 * at random.
> +	 *
> +	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> +	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
> +	 * rev->pending.objects and rev->cmdline.rev are parallel.
> +	 */
> +	for (i = 0; i < rev->cmdline.nr; i++) {
> +		struct object *obj = rev->pending.objects[i].item;
> +		switch (rev->cmdline.rev[i].whence) {
> +		case REV_CMD_MERGE_BASE:
> +			if (basepos < 0)
> +				basepos = i;
> +			basecount++;
> +			break;		/* do mark all bases */
> +		case REV_CMD_LEFT:
> +			if (obj->flags & SYMMETRIC_LEFT) {
> +				lpos = i;
> +				lcount++;
> +				break;	/* do mark A */
> +			}
> +			continue;
> +		case REV_CMD_RIGHT:
> +			rpos = i;
> +			rcount++;
> +			continue;	/* don't mark B */

It is unclear if we want to allow "git diff A..B C..D" (or
alternatively "git diff A...B C..D") and if so why.  

It appears that you are allowing both, but I am not sure if that is
a good idea.  Read a bit further below.

> +		default:
> +			continue;
> +		}
> +		if (map == NULL)
> +			map = bitmap_new();
> +		bitmap_set(map, i);
> +	}
> +
> +	if (lcount == 0) {	/* not a symmetric difference */
> +		bitmap_free(map);
> +		sym->warn = 0;
> +		sym->skip = NULL;
> +		return;
> +	}
> +
> +	if (lcount != 1)
> +		die(_("cannot use more than one symmetric difference"));
> +
> +	if (basecount == 0) {
> +		const char *lname = rev->pending.objects[lpos].name;
> +		const char *rname = rev->pending.objects[rpos].name;
> +		die(_("%s...%s: no merge base"), lname, rname);

When "git diff A...B C..D" is given, what do we want to do?  A is
the only element marked with REV_CMD_LEFT, which is pointed at by
lpos and lcount gets incremented to become one.  When we see B, we
make rpos point at it, but then later, when we see D, wouldn't rpos
get updated to point at it?  What error message would we give when
we see no merge base then?  We would want to say the symdiff between
A and B has no merge bases, but wouldn't we end up mentioning D
instead of B?

> +	}
> +	bitmap_unset(map, basepos);	/* unmark the base we want */
> +	sym->base = basepos;
> +	sym->left = lpos;
> +	sym->right = rpos;
> +	sym->warn = basecount > 1;
> +	sym->skip = map;
> +}

> @@ -394,8 +494,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  			die(_("invalid object '%s' given."), name);
>  		if (obj->type == OBJ_COMMIT)
>  			obj = &get_commit_tree(((struct commit *)obj))->object;
> -

Do not lose this blank line.  It does not make a difference to the
compiler, but it is semantically significant to human readers.

>  		if (obj->type == OBJ_TREE) {
> +			if (sdiff.skip && bitmap_get(sdiff.skip, i))
> +				continue;
>  			obj->flags |= flags;
>  			add_object_array(obj, name, &ent);
>  		} else if (obj->type == OBJ_BLOB) {

> diff --git a/t/t4068-diff-symmetric.sh b/t/t4068-diff-symmetric.sh
> new file mode 100755
> index 00000000000..7b5988933da
> --- /dev/null
> +++ b/t/t4068-diff-symmetric.sh
> @@ -0,0 +1,81 @@
> +#!/bin/sh
> +...
> +test_expect_success setup '
> +	git commit --allow-empty -m A &&
> +	echo b >b &&
> +	git add b &&
> +	git commit -m B &&
> +	git checkout -b br1 HEAD^ &&
> +	echo c >c &&
> +	git add c &&
> +	git commit -m C &&
> +	git tag commit-C &&
> +	git merge -m D master &&
> +	git tag commit-D &&
> +	git checkout master &&
> +	git merge -m E commit-C &&
> +	git checkout -b br2 commit-C &&
> +	echo f >f &&
> +	git add f &&
> +	git commit -m F &&
> +	git checkout br1 &&
> +	git merge -m G br2 &&
> +	git checkout --orphan br3 &&
> +	git commit -m H
> +'
> +
> +test_expect_success 'diff with one merge base' '
> +	git diff commit-D...br1 >tmp &&
> +	tail -1 tmp >actual &&

Let's make sure we spell "tail -n 1" (same for "head -1" -> "head -n 1").
I know there are a handful of existing offenders, but that does not mean
it is OK to make things worse.

> +	echo +f >expect &&
> +	test_cmp expect actual

Re: [PATCH v2 3/3] Documentation: tweak git diff help slightly

Junio C Hamano
Details
Message ID
<xmqqbllrhmie.fsf@gitster.c.googlers.com>
In-Reply-To
<b9b4c6f113dfb03268b391cc62abfd38bd7632e4.1591729224.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Chris Torek <chris.torek@gmail.com>
>
> Update the manual page synopsis to include the three-dot notation
> and the combined-diff option

Surely.  That is "tweak ... slightly".  Full-stop is missing here,
by the way.

> Make "git diff -h" print the same usage summary as the manual
> page synopsis, minus the "A..B" form, which is now discouraged.

Good.

> Document the usage for producing combined commits.

Yup, that is "while we are at it".  The new text reads well, but it
appears that it is the more significant part of the change in this
patch now ;-)

> diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
> index 37781cf1755..0bce278652a 100644
> --- a/Documentation/git-diff.txt
> +++ b/Documentation/git-diff.txt
> @@ -11,15 +11,17 @@ SYNOPSIS
>  [verse]
>  'git diff' [<options>] [<commit>] [--] [<path>...]
>  'git diff' [<options>] --cached [<commit>] [--] [<path>...]
> -'git diff' [<options>] <commit> <commit> [--] [<path>...]
> +'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
> +'git diff' [<options>] <commit>...<commit> [--] [<path>...]
>  'git diff' [<options>] <blob> <blob>
>  'git diff' [<options>] --no-index [--] <path> <path>
>  
>  DESCRIPTION
>  -----------
>  Show changes between the working tree and the index or a tree, changes
> -between the index and a tree, changes between two trees, changes between
> -two blob objects, or changes between two files on disk.
> +between the index and a tree, changes between two trees, changes resulting
> +from a merge, changes between two blob objects, or changes between two
> +files on disk.
>  
>  'git diff' [<options>] [--] [<path>...]::
>  
> @@ -67,6 +69,16 @@ two blob objects, or changes between two files on disk.
>  	one side is omitted, it will have the same effect as
>  	using HEAD instead.
>  
> +'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]::
> +
> +	This form is to view the results of a merge commit.  The first
> +	listed <commit> must be the merge itself; the remaining two or
> +	more commits should be its parents.  A convenient way to produce
> +	the desired set of revisions is to use the {caret}@ suffix, i.e.,
> +	"git diff master master^@".  This is equivalent to running "git

Don't we usually use `git diff master master^@` to mark up literal
examples with tt, instead of "git diff..." with double-quotes?

> +	show --format=" on the merge commit, e.g., "git show --format=
> +	master".

Likewise.

But more importantly, I think giving the exact equivalent is much
less important than keeping the explanation concise, simple and
clear, and the "empty format to omit the log part" is distracting
(after all, teaching how to squelch the log message part in the
"show" command is not the topic of this manpage).

    For a merge commit `master`, this gives the same combined diff
    as `git show master` does.

perhaps?

Thanks.

[PATCH v3 0/3] improve git-diff documentation and A...B handling

Chris Torek via GitGitGadget
Details
Message ID
<pull.804.v3.git.git.1591888511.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v2.git.git.1591729224.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
git diff -h help is succinct, but perhaps too much so.

The symmetric-diff syntax, git diff A...B, is defined by the documentation
to compare the merge base of A and B to commit B. It does so just fine when
there is a merge base. It compares A and B directly if there is no merge
base, and it is overly forgiving of bad arguments after which it can produce
nonsensical diffs. It also behaves badly with other odd/incorrect usages,
such as git diff A..B C..D.

The first patch simply adjusts a test that will fail if the second patch is
accepted. The second patch adds special handling for the symmetric diff
syntax so that the option parsing works, plus a small test suite. The third
patch updates the documentation, including adding a section for combined
commits, and makes the help output more verbose (to match the SYNOPSIS and
provide common diff options like git-diff-files, for instance).

Changes since v1:

 * updated commit messages
 * rewrote prepare function
 * added tests to reject bad two-dot usage as well
 * removed A..B syntax from usage (and fixed typo)
 * fixed up three-dot synopsis text

Chris Torek (3):
  t/t3430: avoid undefined git diff behavior
  git diff: improve range handling
  Documentation: usage for diff combined commits

 Documentation/git-diff.txt |  20 ++++--
 builtin/diff.c             | 132 +++++++++++++++++++++++++++++++++----
 t/t3430-rebase-merges.sh   |   2 +-
 t/t4068-diff-symmetric.sh  |  91 +++++++++++++++++++++++++
 4 files changed, 226 insertions(+), 19 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh


base-commit: 20514004ddf1a3528de8933bc32f284e175e1012
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-804%2Fchris3torek%2Fcleanup-diff-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-804/chris3torek/cleanup-diff-v3
Pull-Request: https://github.com/git/git/pull/804

Range-diff vs v2:

 1:  2ccaad645ff = 1:  2ccaad645ff t/t3430: avoid undefined git diff behavior
 2:  100fa403477 ! 2:  60aed3f9d65 git diff: improve A...B merge-base handling
     @@ Metadata
      Author: Chris Torek <chris.torek@gmail.com>
      
       ## Commit message ##
     -    git diff: improve A...B merge-base handling
     +    git diff: improve range handling
      
          When git diff is given a symmetric difference A...B, it chooses
          some merge base from the two specified commits (as documented).
     @@ Commit message
             lost.  We will produce a combined diff instead.
      
          Similar weirdness occurs if you use, e.g., "git diff C A...B D".
     +    Likewise, using multiple two-dot ranges, or tossing extra
     +    revision specifiers into the command line with two-dot ranges,
     +    or mixing two and three dot ranges, all produce nonsense.
      
     -    To avoid all this, add a routine to catch the A...B case and verify that
     -    there is at least one merge base, and that the arguments make sense.
     -    As a side effect, produce a warning showing *which* merge base is being
     -    used when there are multiple choices; die if there is no merge base.
     +    To avoid all this, add a routine to catch the range cases and
     +    verify that that the arguments make sense.  As a side effect,
     +    produce a warning showing *which* merge base is being used when
     +    there are multiple choices; die if there is no merge base.
      
          Signed-off-by: Chris Torek <chris.torek@gmail.com>
      
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
       }
       
      +struct symdiff {
     -+	struct bitmap *skip;	/* bitmap of commit indices to skip, or NULL */
     -+	int warn;		/* true if there were multiple merge bases */
     -+	int base, left, right;	/* index of chosen merge base and left&right */
     ++	struct bitmap *skip;
     ++	int warn;
     ++	const char *base, *left, *right;
      +};
      +
      +/*
      + * Check for symmetric-difference arguments, and if present, arrange
     -+ * everything we need to know to handle them correctly.
     ++ * everything we need to know to handle them correctly.  As a bonus,
     ++ * weed out all bogus range-based revision specifications, e.g.,
     ++ * "git diff A..B C..D" or "git diff A..B C" get rejected.
      + *
      + * For an actual symmetric diff, *symdiff is set this way:
      + *
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      + *    which there might be many, and A in A...B).  Note that the
      + *    chosen merge base and right side are NOT marked.
      + *  - warn is set if there are multiple merge bases.
     -+ *  - base, left, and right hold the merge base and left and
     -+ *    right side indices, for warnings or errors.
     ++ *  - base, left, and right point to the names to use in a
     ++ *    warning about multiple merge bases.
      + *
      + * If there is no symmetric diff argument, sym->skip is NULL and
      + * sym->warn is cleared.  The remaining fields are not set.
     -+ *
     -+ * If the user provides a symmetric diff with no merge base, or
     -+ * more than one range, we do a usage-exit.
      + */
      +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
      +{
     -+	int i, lcount = 0, rcount = 0, basecount = 0;
     ++	int i, is_symdiff = 0, basecount = 0, othercount = 0;
      +	int lpos = -1, rpos = -1, basepos = -1;
      +	struct bitmap *map = NULL;
      +
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      +			basecount++;
      +			break;		/* do mark all bases */
      +		case REV_CMD_LEFT:
     ++			if (lpos > 0)
     ++				usage(builtin_diff_usage);
     ++			lpos = i;
      +			if (obj->flags & SYMMETRIC_LEFT) {
     -+				lpos = i;
     -+				lcount++;
     ++				is_symdiff = 1;
      +				break;	/* do mark A */
      +			}
      +			continue;
      +		case REV_CMD_RIGHT:
     ++			if (rpos > 0)
     ++				usage(builtin_diff_usage);
      +			rpos = i;
     -+			rcount++;
      +			continue;	/* don't mark B */
     -+		default:
     ++		case REV_CMD_PARENTS_ONLY:
     ++		case REV_CMD_REF:
     ++		case REV_CMD_REV:
     ++			othercount++;
      +			continue;
      +		}
      +		if (map == NULL)
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      +		bitmap_set(map, i);
      +	}
      +
     -+	if (lcount == 0) {	/* not a symmetric difference */
     ++	/*
     ++	 * Forbid any additional revs for both A...B and A..B.
     ++	 */
     ++	if (lpos >= 0 && othercount > 0)
     ++		usage(builtin_diff_usage);
     ++
     ++	if (!is_symdiff) {
      +		bitmap_free(map);
      +		sym->warn = 0;
      +		sym->skip = NULL;
      +		return;
      +	}
      +
     -+	if (lcount != 1)
     -+		die(_("cannot use more than one symmetric difference"));
     -+
     -+	if (basecount == 0) {
     -+		const char *lname = rev->pending.objects[lpos].name;
     -+		const char *rname = rev->pending.objects[rpos].name;
     -+		die(_("%s...%s: no merge base"), lname, rname);
     -+	}
     ++	sym->left = rev->pending.objects[lpos].name;
     ++	sym->right = rev->pending.objects[rpos].name;
     ++	sym->base = rev->pending.objects[basepos].name;
     ++	if (basecount == 0)
     ++		die(_("%s...%s: no merge base"), sym->left, sym->right);
      +	bitmap_unset(map, basepos);	/* unmark the base we want */
     -+	sym->base = basepos;
     -+	sym->left = lpos;
     -+	sym->right = rpos;
      +	sym->warn = basecount > 1;
      +	sym->skip = map;
      +}
     @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
       		struct object_array_entry *entry = &rev.pending.objects[i];
       		struct object *obj = entry->item;
      @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
     - 			die(_("invalid object '%s' given."), name);
     - 		if (obj->type == OBJ_COMMIT)
       			obj = &get_commit_tree(((struct commit *)obj))->object;
     --
     + 
       		if (obj->type == OBJ_TREE) {
      +			if (sdiff.skip && bitmap_get(sdiff.skip, i))
      +				continue;
     @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
       		result = builtin_diff_index(&rev, argc, argv);
      -	else if (ent.nr == 2)
      +	else if (ent.nr == 2) {
     -+		if (sdiff.warn) {
     -+			const char *lname = rev.pending.objects[sdiff.left].name;
     -+			const char *rname = rev.pending.objects[sdiff.right].name;
     -+			const char *basename = rev.pending.objects[sdiff.base].name;
     ++		if (sdiff.warn)
      +			warning(_("%s...%s: multiple merge bases, using %s"),
     -+				lname, rname, basename);
     -+		}
     ++				sdiff.left, sdiff.right, sdiff.base);
       		result = builtin_diff_tree(&rev, argc, argv,
       					   &ent.objects[0], &ent.objects[1]);
      -	else if (ent.objects[0].item->flags & UNINTERESTING) {
     @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
      -		result = builtin_diff_tree(&rev, argc, argv,
      -					   &ent.objects[0],
      -					   &ent.objects[ent.nr-1]);
     --	} else
     -+	} else {
     -+		if (sdiff.skip)
     -+			usage(builtin_diff_usage);
     + 	} else
       		result = builtin_diff_combined(&rev, argc, argv,
       					       ent.objects, ent.nr);
     -+	}
     - 	result = diff_result_code(&rev.diffopt, result);
     - 	if (1 < rev.diffopt.skip_stat_unmatch)
     - 		refresh_index_quietly();
      
       ## t/t4068-diff-symmetric.sh (new) ##
      @@
     @@ t/t4068-diff-symmetric.sh (new)
      +
      +test_expect_success 'diff with one merge base' '
      +	git diff commit-D...br1 >tmp &&
     -+	tail -1 tmp >actual &&
     ++	tail -n 1 tmp >actual &&
      +	echo +f >expect &&
      +	test_cmp expect actual
      +'
     @@ t/t4068-diff-symmetric.sh (new)
      +
      +test_expect_success 'diff with too many symmetric differences' '
      +	test_must_fail git diff br1...master br2...br3 >tmp 2>err &&
     -+	test_i18ngrep "fatal: cannot use more than one symmetric difference" err
     ++	test_i18ngrep "usage" err
      +'
      +
      +test_expect_success 'diff with symmetric difference and extraneous arg' '
     @@ t/t4068-diff-symmetric.sh (new)
      +	test_i18ngrep "usage" err
      +'
      +
     ++test_expect_success 'diff with two ranges' '
     ++	test_must_fail git diff master br1..master br2..br3 >tmp 2>err &&
     ++	test_i18ngrep "usage" err
     ++'
     ++
     ++test_expect_success 'diff with ranges and extra arg' '
     ++	test_must_fail git diff master br1..master commit-D >tmp 2>err &&
     ++	test_i18ngrep "usage" err
     ++'
     ++
      +test_done
 3:  b9b4c6f113d ! 3:  a7da92cd635 Documentation: tweak git diff help slightly
     @@ Metadata
      Author: Chris Torek <chris.torek@gmail.com>
      
       ## Commit message ##
     -    Documentation: tweak git diff help slightly
     +    Documentation: usage for diff combined commits
      
     -    Update the manual page synopsis to include the three-dot notation
     -    and the combined-diff option
     +    Document the usage for producing combined commits with "git diff".
     +    This includes updating the synopsis section.
     +
     +    While here, add the three-dot notation to the synopsis.
      
          Make "git diff -h" print the same usage summary as the manual
          page synopsis, minus the "A..B" form, which is now discouraged.
      
     -    Document the usage for producing combined commits.
     -
          Signed-off-by: Chris Torek <chris.torek@gmail.com>
      
       ## Documentation/git-diff.txt ##
     @@ Documentation/git-diff.txt: two blob objects, or changes between two files on di
      +	This form is to view the results of a merge commit.  The first
      +	listed <commit> must be the merge itself; the remaining two or
      +	more commits should be its parents.  A convenient way to produce
     -+	the desired set of revisions is to use the {caret}@ suffix, i.e.,
     -+	"git diff master master^@".  This is equivalent to running "git
     -+	show --format=" on the merge commit, e.g., "git show --format=
     -+	master".
     ++	the desired set of revisions is to use the {caret}@ suffix.
     ++	For instance, if `master` names a merge commit, `git diff master
     ++	master^@` gives the same combined diff as `git show master`.
      +
       'git diff' [<options>] <commit>\...<commit> [--] [<path>...]::
       

-- 
gitgitgadget

[PATCH v3 1/3] t/t3430: avoid undefined git diff behavior

Chris Torek via GitGitGadget
Details
Message ID
<2ccaad645ff01b786e76dc63210d75da633389a6.1591888511.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v3.git.git.1591888511.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +1 -1
From: Chris Torek <chris.torek@gmail.com>

The autosquash-and-exec test used "git diff HEAD^!" to mean
"git diff HEAD^ HEAD".  Use these directly instead of relying
on the undefined but actual-current behavior of "HEAD^!".

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 t/t3430-rebase-merges.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index a1bc3e20016..b454f400ebd 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -420,7 +420,7 @@ test_expect_success 'with --autosquash and --exec' '
	git commit --fixup B B.t &&
	write_script show.sh <<-\EOF &&
	subject="$(git show -s --format=%s HEAD)"
	content="$(git diff HEAD^! | tail -n 1)"
	content="$(git diff HEAD^ HEAD | tail -n 1)"
	echo "$subject: $content"
	EOF
	test_tick &&
-- 
gitgitgadget

[PATCH v3 2/3] git diff: improve range handling

Chris Torek via GitGitGadget
Details
Message ID
<60aed3f9d6543a5f66ff75a50c86cf626ec04ed4.1591888511.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v3.git.git.1591888511.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +202 -13
From: Chris Torek <chris.torek@gmail.com>

When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.

To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense.  As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 builtin/diff.c            | 124 ++++++++++++++++++++++++++++++++++----
 t/t4068-diff-symmetric.sh |  91 ++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+), 13 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh

diff --git a/builtin/diff.c b/builtin/diff.c
index 8537b17bd5e..3d33a9231ae 100644
--- a/builtin/diff.c
@@ -6,6 +6,7 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "cache.h"
#include "config.h"
#include "ewah/ewok.h"
#include "lockfile.h"
#include "color.h"
#include "commit.h"
@@ -254,6 +255,108 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
	return run_diff_files(revs, options);
}

struct symdiff {
	struct bitmap *skip;
	int warn;
	const char *base, *left, *right;
};

/*
 * Check for symmetric-difference arguments, and if present, arrange
 * everything we need to know to handle them correctly.  As a bonus,
 * weed out all bogus range-based revision specifications, e.g.,
 * "git diff A..B C..D" or "git diff A..B C" get rejected.
 *
 * For an actual symmetric diff, *symdiff is set this way:
 *
 *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
 *    indices that the caller should ignore (extra merge bases, of
 *    which there might be many, and A in A...B).  Note that the
 *    chosen merge base and right side are NOT marked.
 *  - warn is set if there are multiple merge bases.
 *  - base, left, and right point to the names to use in a
 *    warning about multiple merge bases.
 *
 * If there is no symmetric diff argument, sym->skip is NULL and
 * sym->warn is cleared.  The remaining fields are not set.
 */
static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
{
	int i, is_symdiff = 0, basecount = 0, othercount = 0;
	int lpos = -1, rpos = -1, basepos = -1;
	struct bitmap *map = NULL;

	/*
	 * Use the whence fields to find merge bases and left and
	 * right parts of symmetric difference, so that we do not
	 * depend on the order that revisions are parsed.  If there
	 * are any revs that aren't from these sources, we have a
	 * "git diff C A...B" or "git diff A...B C" case.  Or we
	 * could even get "git diff A...B C...E", for instance.
	 *
	 * If we don't have just one merge base, we pick one
	 * at random.
	 *
	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
	 * rev->pending.objects and rev->cmdline.rev are parallel.
	 */
	for (i = 0; i < rev->cmdline.nr; i++) {
		struct object *obj = rev->pending.objects[i].item;
		switch (rev->cmdline.rev[i].whence) {
		case REV_CMD_MERGE_BASE:
			if (basepos < 0)
				basepos = i;
			basecount++;
			break;		/* do mark all bases */
		case REV_CMD_LEFT:
			if (lpos > 0)
				usage(builtin_diff_usage);
			lpos = i;
			if (obj->flags & SYMMETRIC_LEFT) {
				is_symdiff = 1;
				break;	/* do mark A */
			}
			continue;
		case REV_CMD_RIGHT:
			if (rpos > 0)
				usage(builtin_diff_usage);
			rpos = i;
			continue;	/* don't mark B */
		case REV_CMD_PARENTS_ONLY:
		case REV_CMD_REF:
		case REV_CMD_REV:
			othercount++;
			continue;
		}
		if (map == NULL)
			map = bitmap_new();
		bitmap_set(map, i);
	}

	/*
	 * Forbid any additional revs for both A...B and A..B.
	 */
	if (lpos >= 0 && othercount > 0)
		usage(builtin_diff_usage);

	if (!is_symdiff) {
		bitmap_free(map);
		sym->warn = 0;
		sym->skip = NULL;
		return;
	}

	sym->left = rev->pending.objects[lpos].name;
	sym->right = rev->pending.objects[rpos].name;
	sym->base = rev->pending.objects[basepos].name;
	if (basecount == 0)
		die(_("%s...%s: no merge base"), sym->left, sym->right);
	bitmap_unset(map, basepos);	/* unmark the base we want */
	sym->warn = basecount > 1;
	sym->skip = map;
}

int cmd_diff(int argc, const char **argv, const char *prefix)
{
	int i;
@@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
	struct object_array_entry *blob[2];
	int nongit = 0, no_index = 0;
	int result = 0;
	struct symdiff sdiff;

	/*
	 * We could get N tree-ish in the rev.pending_objects list.
@@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		}
	}

	symdiff_prepare(&rev, &sdiff);
	for (i = 0; i < rev.pending.nr; i++) {
		struct object_array_entry *entry = &rev.pending.objects[i];
		struct object *obj = entry->item;
@@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
			obj = &get_commit_tree(((struct commit *)obj))->object;

		if (obj->type == OBJ_TREE) {
			if (sdiff.skip && bitmap_get(sdiff.skip, i))
				continue;
			obj->flags |= flags;
			add_object_array(obj, name, &ent);
		} else if (obj->type == OBJ_BLOB) {
@@ -437,21 +544,12 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		usage(builtin_diff_usage);
	else if (ent.nr == 1)
		result = builtin_diff_index(&rev, argc, argv);
	else if (ent.nr == 2)
	else if (ent.nr == 2) {
		if (sdiff.warn)
			warning(_("%s...%s: multiple merge bases, using %s"),
				sdiff.left, sdiff.right, sdiff.base);
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0], &ent.objects[1]);
	else if (ent.objects[0].item->flags & UNINTERESTING) {
		/*
		 * diff A...B where there is at least one merge base
		 * between A and B.  We have ent.objects[0] ==
		 * merge-base, ent.objects[ents-2] == A, and
		 * ent.objects[ents-1] == B.  Show diff between the
		 * base and B.  Note that we pick one merge base at
		 * random if there are more than one.
		 */
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0],
					   &ent.objects[ent.nr-1]);
	} else
		result = builtin_diff_combined(&rev, argc, argv,
					       ent.objects, ent.nr);
diff --git a/t/t4068-diff-symmetric.sh b/t/t4068-diff-symmetric.sh
new file mode 100755
index 00000000000..dd60da0294e
--- /dev/null
+++ b/t/t4068-diff-symmetric.sh
@@ -0,0 +1,91 @@
#!/bin/sh

test_description='behavior of diff with symmetric-diff setups'

. ./test-lib.sh

# build these situations:
#  - normal merge with one merge base (b1...b2);
#  - criss-cross merge ie 2 merge bases (b1...master);
#  - disjoint subgraph (orphan branch, b3...master).
#
#     B---E   <-- master
#    / \ /
#   A   X
#    \ / \
#     C---D--G   <-- br1
#      \    /
#       ---F   <-- br2
#
#  H  <-- br3
#
# We put files into a few commits so that we can verify the
# output as well.

test_expect_success setup '
	git commit --allow-empty -m A &&
	echo b >b &&
	git add b &&
	git commit -m B &&
	git checkout -b br1 HEAD^ &&
	echo c >c &&
	git add c &&
	git commit -m C &&
	git tag commit-C &&
	git merge -m D master &&
	git tag commit-D &&
	git checkout master &&
	git merge -m E commit-C &&
	git checkout -b br2 commit-C &&
	echo f >f &&
	git add f &&
	git commit -m F &&
	git checkout br1 &&
	git merge -m G br2 &&
	git checkout --orphan br3 &&
	git commit -m H
'

test_expect_success 'diff with one merge base' '
	git diff commit-D...br1 >tmp &&
	tail -n 1 tmp >actual &&
	echo +f >expect &&
	test_cmp expect actual
'

# The output (in tmp) can have +b or +c depending
# on which merge base (commit B or C) is picked.
# It should have one of those two, which comes out
# to seven lines.
test_expect_success 'diff with two merge bases' '
	git diff br1...master >tmp 2>err &&
	test_line_count = 7 tmp &&
	test_line_count = 1 err
'

test_expect_success 'diff with no merge bases' '
	test_must_fail git diff br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: br2...br3: no merge base" err
'

test_expect_success 'diff with too many symmetric differences' '
	test_must_fail git diff br1...master br2...br3 >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with symmetric difference and extraneous arg' '
	test_must_fail git diff master br1...master >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with two ranges' '
	test_must_fail git diff master br1..master br2..br3 >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with ranges and extra arg' '
	test_must_fail git diff master br1..master commit-D >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_done
-- 
gitgitgadget

[PATCH v3 3/3] Documentation: usage for diff combined commits

Chris Torek via GitGitGadget
Details
Message ID
<a7da92cd63582ec9405a529ee4ecbe245f18a4e1.1591888511.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v3.git.git.1591888511.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +23 -5
From: Chris Torek <chris.torek@gmail.com>

Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.

While here, add the three-dot notation to the synopsis.

Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 Documentation/git-diff.txt | 20 ++++++++++++++++----
 builtin/diff.c             |  8 +++++++-
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 37781cf1755..1018110ddc2 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -11,15 +11,17 @@ SYNOPSIS
[verse]
'git diff' [<options>] [<commit>] [--] [<path>...]
'git diff' [<options>] --cached [<commit>] [--] [<path>...]
'git diff' [<options>] <commit> <commit> [--] [<path>...]
'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
'git diff' [<options>] <commit>...<commit> [--] [<path>...]
'git diff' [<options>] <blob> <blob>
'git diff' [<options>] --no-index [--] <path> <path>

DESCRIPTION
-----------
Show changes between the working tree and the index or a tree, changes
between the index and a tree, changes between two trees, changes between
two blob objects, or changes between two files on disk.
between the index and a tree, changes between two trees, changes resulting
from a merge, changes between two blob objects, or changes between two
files on disk.

'git diff' [<options>] [--] [<path>...]::

@@ -67,6 +69,15 @@ two blob objects, or changes between two files on disk.
	one side is omitted, it will have the same effect as
	using HEAD instead.

'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]::

	This form is to view the results of a merge commit.  The first
	listed <commit> must be the merge itself; the remaining two or
	more commits should be its parents.  A convenient way to produce
	the desired set of revisions is to use the {caret}@ suffix.
	For instance, if `master` names a merge commit, `git diff master
	master^@` gives the same combined diff as `git show master`.

'git diff' [<options>] <commit>\...<commit> [--] [<path>...]::

	This form is to view the changes on the branch containing
@@ -196,7 +207,8 @@ linkgit:git-difftool[1],
linkgit:git-log[1],
linkgit:gitdiffcore[7],
linkgit:git-format-patch[1],
linkgit:git-apply[1]
linkgit:git-apply[1],
linkgit:git-show[1]

GIT
---
diff --git a/builtin/diff.c b/builtin/diff.c
index 3d33a9231ae..c8a999f8089 100644
--- a/builtin/diff.c
@@ -24,7 +24,13 @@
#define DIFF_NO_INDEX_IMPLICIT 2

static const char builtin_diff_usage[] =
"git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
"git diff [<options>] [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <blob> <blob>]\n"
"   or: git diff [<options>] --no-index [--] <path> <path>]\n"
COMMON_DIFF_OPTIONS_HELP;

static const char *blob_path(struct object_array_entry *entry)
{
-- 
gitgitgadget

Re: [PATCH v3 2/3] git diff: improve range handling

Chris Torek
Details
Message ID
<CAPx1GvcvDCoOHrSOgybDpKawMTPSHs2FUq6-sWVOmwAS_GRKzA@mail.gmail.com>
In-Reply-To
<60aed3f9d6543a5f66ff75a50c86cf626ec04ed4.1591888511.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On Thu, Jun 11, 2020 at 8:15 AM Chris Torek via GitGitGadget
<gitgitgadget@gmail.com> wrote:
 > +                       if (lpos > 0)

Ugh, this and the rpos test are supposed to be >= not >.  Will fix these for v4.
Otherwise seems review-able.

Also, I forgot to update the GitGitGadget "changes since",
It's changes since v2, not since v1, of course.

Chris

Re: [PATCH 2/3] git diff: improve A...B merge-base handling

Philip Oakley
Details
Message ID
<6eadaa89-fde7-4224-dcb9-ceef315942f2@iee.email>
In-Reply-To
<f7c8f094e02406a7d0cb0c61f880e5b01fa413c4.1591661021.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 09/06/2020 01:03, Chris Torek via GitGitGadget wrote:
[snip]
> +test_description='behavior of diff with symmetric-diff setups'
> +
> +. ./test-lib.sh
> +
> +# build these situations:
> +#  - normal merge with one merge base (b1...b2);
> +#  - criss-cross merge ie 2 merge bases (b1...master);
> +#  - disjoint subgraph (orphan branch, b3...master).

nit:
Use of b1, b2, b3 here, but br1, br2, br3 below
> +#
> +#     B---E   <-- master
> +#    / \ /
> +#   A   X
> +#    \ / \
> +#     C---D--G   <-- br1
> +#      \    /
> +#       ---F   <-- br2
> +#
> +#  H  <-- br3
> +#
Philip

[PATCH v4 0/3] improve git-diff documentation and A...B handling

Chris Torek via GitGitGadget
Details
Message ID
<pull.804.v4.git.git.1591978801.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v3.git.git.1591888511.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
git diff -h help is succinct, but perhaps too much so.

The symmetric-diff syntax, git diff A...B, is defined by the documentation
to compare the merge base of A and B to commit B. It does so just fine when
there is a merge base. It compares A and B directly if there is no merge
base, and it is overly forgiving of bad arguments after which it can produce
nonsensical diffs. It also behaves badly with other odd/incorrect usages,
such as git diff A...B C..D.

The first patch simply adjusts a test that will fail if the second patch is
accepted. The second patch adds special handling for the symmetric and range
diff syntax so that the option parsing works, plus a small test suite. The
third patch updates the documentation, including adding a section for
combined commits, and makes the help output more verbose (to match the
SYNOPSIS and provide common diff options like git-diff-files, for instance).

Changes since v3:

 * correct > / >= goof
 * fix test nit per Philip Oakley

Chris Torek (3):
  t/t3430: avoid undefined git diff behavior
  git diff: improve range handling
  Documentation: usage for diff combined commits

 Documentation/git-diff.txt |  20 ++++--
 builtin/diff.c             | 132 +++++++++++++++++++++++++++++++++----
 t/t3430-rebase-merges.sh   |   2 +-
 t/t4068-diff-symmetric.sh  |  91 +++++++++++++++++++++++++
 4 files changed, 226 insertions(+), 19 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh


base-commit: 20514004ddf1a3528de8933bc32f284e175e1012
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-804%2Fchris3torek%2Fcleanup-diff-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-804/chris3torek/cleanup-diff-v4
Pull-Request: https://github.com/git/git/pull/804

Range-diff vs v3:

 1:  2ccaad645ff = 1:  2ccaad645ff t/t3430: avoid undefined git diff behavior
 2:  60aed3f9d65 ! 2:  4fa6fba33b3 git diff: improve range handling
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      +			basecount++;
      +			break;		/* do mark all bases */
      +		case REV_CMD_LEFT:
     -+			if (lpos > 0)
     ++			if (lpos >= 0)
      +				usage(builtin_diff_usage);
      +			lpos = i;
      +			if (obj->flags & SYMMETRIC_LEFT) {
     @@ builtin/diff.c: static int builtin_diff_files(struct rev_info *revs, int argc, c
      +			}
      +			continue;
      +		case REV_CMD_RIGHT:
     -+			if (rpos > 0)
     ++			if (rpos >= 0)
      +				usage(builtin_diff_usage);
      +			rpos = i;
      +			continue;	/* don't mark B */
     @@ t/t4068-diff-symmetric.sh (new)
      +. ./test-lib.sh
      +
      +# build these situations:
     -+#  - normal merge with one merge base (b1...b2);
     -+#  - criss-cross merge ie 2 merge bases (b1...master);
     -+#  - disjoint subgraph (orphan branch, b3...master).
     ++#  - normal merge with one merge base (br1...b2r);
     ++#  - criss-cross merge ie 2 merge bases (br1...master);
     ++#  - disjoint subgraph (orphan branch, br3...master).
      +#
      +#     B---E   <-- master
      +#    / \ /
 3:  a7da92cd635 = 3:  d7bc9aca44b Documentation: usage for diff combined commits

-- 
gitgitgadget

[PATCH v4 2/3] git diff: improve range handling

Chris Torek via GitGitGadget
Details
Message ID
<4fa6fba33b329ce85f68470fb2545adf1bb06900.1591978801.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v4.git.git.1591978801.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +202 -13
From: Chris Torek <chris.torek@gmail.com>

When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.

To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense.  As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 builtin/diff.c            | 124 ++++++++++++++++++++++++++++++++++----
 t/t4068-diff-symmetric.sh |  91 ++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+), 13 deletions(-)
 create mode 100755 t/t4068-diff-symmetric.sh

diff --git a/builtin/diff.c b/builtin/diff.c
index 8537b17bd5e..0f48b0d3e71 100644
--- a/builtin/diff.c
@@ -6,6 +6,7 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
#include "cache.h"
#include "config.h"
#include "ewah/ewok.h"
#include "lockfile.h"
#include "color.h"
#include "commit.h"
@@ -254,6 +255,108 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
	return run_diff_files(revs, options);
}

struct symdiff {
	struct bitmap *skip;
	int warn;
	const char *base, *left, *right;
};

/*
 * Check for symmetric-difference arguments, and if present, arrange
 * everything we need to know to handle them correctly.  As a bonus,
 * weed out all bogus range-based revision specifications, e.g.,
 * "git diff A..B C..D" or "git diff A..B C" get rejected.
 *
 * For an actual symmetric diff, *symdiff is set this way:
 *
 *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
 *    indices that the caller should ignore (extra merge bases, of
 *    which there might be many, and A in A...B).  Note that the
 *    chosen merge base and right side are NOT marked.
 *  - warn is set if there are multiple merge bases.
 *  - base, left, and right point to the names to use in a
 *    warning about multiple merge bases.
 *
 * If there is no symmetric diff argument, sym->skip is NULL and
 * sym->warn is cleared.  The remaining fields are not set.
 */
static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
{
	int i, is_symdiff = 0, basecount = 0, othercount = 0;
	int lpos = -1, rpos = -1, basepos = -1;
	struct bitmap *map = NULL;

	/*
	 * Use the whence fields to find merge bases and left and
	 * right parts of symmetric difference, so that we do not
	 * depend on the order that revisions are parsed.  If there
	 * are any revs that aren't from these sources, we have a
	 * "git diff C A...B" or "git diff A...B C" case.  Or we
	 * could even get "git diff A...B C...E", for instance.
	 *
	 * If we don't have just one merge base, we pick one
	 * at random.
	 *
	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
	 * rev->pending.objects and rev->cmdline.rev are parallel.
	 */
	for (i = 0; i < rev->cmdline.nr; i++) {
		struct object *obj = rev->pending.objects[i].item;
		switch (rev->cmdline.rev[i].whence) {
		case REV_CMD_MERGE_BASE:
			if (basepos < 0)
				basepos = i;
			basecount++;
			break;		/* do mark all bases */
		case REV_CMD_LEFT:
			if (lpos >= 0)
				usage(builtin_diff_usage);
			lpos = i;
			if (obj->flags & SYMMETRIC_LEFT) {
				is_symdiff = 1;
				break;	/* do mark A */
			}
			continue;
		case REV_CMD_RIGHT:
			if (rpos >= 0)
				usage(builtin_diff_usage);
			rpos = i;
			continue;	/* don't mark B */
		case REV_CMD_PARENTS_ONLY:
		case REV_CMD_REF:
		case REV_CMD_REV:
			othercount++;
			continue;
		}
		if (map == NULL)
			map = bitmap_new();
		bitmap_set(map, i);
	}

	/*
	 * Forbid any additional revs for both A...B and A..B.
	 */
	if (lpos >= 0 && othercount > 0)
		usage(builtin_diff_usage);

	if (!is_symdiff) {
		bitmap_free(map);
		sym->warn = 0;
		sym->skip = NULL;
		return;
	}

	sym->left = rev->pending.objects[lpos].name;
	sym->right = rev->pending.objects[rpos].name;
	sym->base = rev->pending.objects[basepos].name;
	if (basecount == 0)
		die(_("%s...%s: no merge base"), sym->left, sym->right);
	bitmap_unset(map, basepos);	/* unmark the base we want */
	sym->warn = basecount > 1;
	sym->skip = map;
}

int cmd_diff(int argc, const char **argv, const char *prefix)
{
	int i;
@@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
	struct object_array_entry *blob[2];
	int nongit = 0, no_index = 0;
	int result = 0;
	struct symdiff sdiff;

	/*
	 * We could get N tree-ish in the rev.pending_objects list.
@@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		}
	}

	symdiff_prepare(&rev, &sdiff);
	for (i = 0; i < rev.pending.nr; i++) {
		struct object_array_entry *entry = &rev.pending.objects[i];
		struct object *obj = entry->item;
@@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
			obj = &get_commit_tree(((struct commit *)obj))->object;

		if (obj->type == OBJ_TREE) {
			if (sdiff.skip && bitmap_get(sdiff.skip, i))
				continue;
			obj->flags |= flags;
			add_object_array(obj, name, &ent);
		} else if (obj->type == OBJ_BLOB) {
@@ -437,21 +544,12 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
		usage(builtin_diff_usage);
	else if (ent.nr == 1)
		result = builtin_diff_index(&rev, argc, argv);
	else if (ent.nr == 2)
	else if (ent.nr == 2) {
		if (sdiff.warn)
			warning(_("%s...%s: multiple merge bases, using %s"),
				sdiff.left, sdiff.right, sdiff.base);
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0], &ent.objects[1]);
	else if (ent.objects[0].item->flags & UNINTERESTING) {
		/*
		 * diff A...B where there is at least one merge base
		 * between A and B.  We have ent.objects[0] ==
		 * merge-base, ent.objects[ents-2] == A, and
		 * ent.objects[ents-1] == B.  Show diff between the
		 * base and B.  Note that we pick one merge base at
		 * random if there are more than one.
		 */
		result = builtin_diff_tree(&rev, argc, argv,
					   &ent.objects[0],
					   &ent.objects[ent.nr-1]);
	} else
		result = builtin_diff_combined(&rev, argc, argv,
					       ent.objects, ent.nr);
diff --git a/t/t4068-diff-symmetric.sh b/t/t4068-diff-symmetric.sh
new file mode 100755
index 00000000000..31d17a5af02
--- /dev/null
+++ b/t/t4068-diff-symmetric.sh
@@ -0,0 +1,91 @@
#!/bin/sh

test_description='behavior of diff with symmetric-diff setups'

. ./test-lib.sh

# build these situations:
#  - normal merge with one merge base (br1...b2r);
#  - criss-cross merge ie 2 merge bases (br1...master);
#  - disjoint subgraph (orphan branch, br3...master).
#
#     B---E   <-- master
#    / \ /
#   A   X
#    \ / \
#     C---D--G   <-- br1
#      \    /
#       ---F   <-- br2
#
#  H  <-- br3
#
# We put files into a few commits so that we can verify the
# output as well.

test_expect_success setup '
	git commit --allow-empty -m A &&
	echo b >b &&
	git add b &&
	git commit -m B &&
	git checkout -b br1 HEAD^ &&
	echo c >c &&
	git add c &&
	git commit -m C &&
	git tag commit-C &&
	git merge -m D master &&
	git tag commit-D &&
	git checkout master &&
	git merge -m E commit-C &&
	git checkout -b br2 commit-C &&
	echo f >f &&
	git add f &&
	git commit -m F &&
	git checkout br1 &&
	git merge -m G br2 &&
	git checkout --orphan br3 &&
	git commit -m H
'

test_expect_success 'diff with one merge base' '
	git diff commit-D...br1 >tmp &&
	tail -n 1 tmp >actual &&
	echo +f >expect &&
	test_cmp expect actual
'

# The output (in tmp) can have +b or +c depending
# on which merge base (commit B or C) is picked.
# It should have one of those two, which comes out
# to seven lines.
test_expect_success 'diff with two merge bases' '
	git diff br1...master >tmp 2>err &&
	test_line_count = 7 tmp &&
	test_line_count = 1 err
'

test_expect_success 'diff with no merge bases' '
	test_must_fail git diff br2...br3 >tmp 2>err &&
	test_i18ngrep "fatal: br2...br3: no merge base" err
'

test_expect_success 'diff with too many symmetric differences' '
	test_must_fail git diff br1...master br2...br3 >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with symmetric difference and extraneous arg' '
	test_must_fail git diff master br1...master >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with two ranges' '
	test_must_fail git diff master br1..master br2..br3 >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_expect_success 'diff with ranges and extra arg' '
	test_must_fail git diff master br1..master commit-D >tmp 2>err &&
	test_i18ngrep "usage" err
'

test_done
-- 
gitgitgadget

[PATCH v4 3/3] Documentation: usage for diff combined commits

Chris Torek via GitGitGadget
Details
Message ID
<d7bc9aca44bdff18ec2cee4f45d34c0965dc2002.1591978801.git.gitgitgadget@gmail.com>
In-Reply-To
<pull.804.v4.git.git.1591978801.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +23 -5
From: Chris Torek <chris.torek@gmail.com>

Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.

While here, add the three-dot notation to the synopsis.

Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
---
 Documentation/git-diff.txt | 20 ++++++++++++++++----
 builtin/diff.c             |  8 +++++++-
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 37781cf1755..1018110ddc2 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -11,15 +11,17 @@ SYNOPSIS
[verse]
'git diff' [<options>] [<commit>] [--] [<path>...]
'git diff' [<options>] --cached [<commit>] [--] [<path>...]
'git diff' [<options>] <commit> <commit> [--] [<path>...]
'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
'git diff' [<options>] <commit>...<commit> [--] [<path>...]
'git diff' [<options>] <blob> <blob>
'git diff' [<options>] --no-index [--] <path> <path>

DESCRIPTION
-----------
Show changes between the working tree and the index or a tree, changes
between the index and a tree, changes between two trees, changes between
two blob objects, or changes between two files on disk.
between the index and a tree, changes between two trees, changes resulting
from a merge, changes between two blob objects, or changes between two
files on disk.

'git diff' [<options>] [--] [<path>...]::

@@ -67,6 +69,15 @@ two blob objects, or changes between two files on disk.
	one side is omitted, it will have the same effect as
	using HEAD instead.

'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]::

	This form is to view the results of a merge commit.  The first
	listed <commit> must be the merge itself; the remaining two or
	more commits should be its parents.  A convenient way to produce
	the desired set of revisions is to use the {caret}@ suffix.
	For instance, if `master` names a merge commit, `git diff master
	master^@` gives the same combined diff as `git show master`.

'git diff' [<options>] <commit>\...<commit> [--] [<path>...]::

	This form is to view the changes on the branch containing
@@ -196,7 +207,8 @@ linkgit:git-difftool[1],
linkgit:git-log[1],
linkgit:gitdiffcore[7],
linkgit:git-format-patch[1],
linkgit:git-apply[1]
linkgit:git-apply[1],
linkgit:git-show[1]

GIT
---
diff --git a/builtin/diff.c b/builtin/diff.c
index 0f48b0d3e71..b3d17340ee3 100644
--- a/builtin/diff.c
@@ -24,7 +24,13 @@
#define DIFF_NO_INDEX_IMPLICIT 2

static const char builtin_diff_usage[] =
"git diff [<options>] [<commit> [<commit>]] [--] [<path>...]";
"git diff [<options>] [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
"   or: git diff [<options>] <blob> <blob>]\n"
"   or: git diff [<options>] --no-index [--] <path> <path>]\n"
COMMON_DIFF_OPTIONS_HELP;

static const char *blob_path(struct object_array_entry *entry)
{
-- 
gitgitgadget

Re: [PATCH 2/3] git diff: improve A...B merge-base handling

Junio C Hamano
Details
Message ID
<xmqq8sgs2ozk.fsf@gitster.c.googlers.com>
In-Reply-To
<6eadaa89-fde7-4224-dcb9-ceef315942f2@iee.email> (view parent)
DKIM signature
missing
Download raw message
Philip Oakley <philipoakley@iee.email> writes:

> On 09/06/2020 01:03, Chris Torek via GitGitGadget wrote:
> [snip]
>> +test_description='behavior of diff with symmetric-diff setups'
>> +
>> +. ./test-lib.sh
>> +
>> +# build these situations:
>> +#  - normal merge with one merge base (b1...b2);
>> +#  - criss-cross merge ie 2 merge bases (b1...master);
>> +#  - disjoint subgraph (orphan branch, b3...master).
>
> nit:
> Use of b1, b2, b3 here, but br1, br2, br3 below
>> +#
>> +#     B---E   <-- master
>> +#    / \ /
>> +#   A   X
>> +#    \ / \
>> +#     C---D--G   <-- br1
>> +#      \    /
>> +#       ---F   <-- br2
>> +#
>> +#  H  <-- br3
>> +#


True.  Which one is to be recommended?  The shorter and sweeter b1,
b2 and b3?

In any case, that must match the topology used by the tests.

Thanks.

Re: [PATCH v4 2/3] git diff: improve range handling

Junio C Hamano
Details
Message ID
<xmqq36702l1d.fsf@gitster.c.googlers.com>
In-Reply-To
<4fa6fba33b329ce85f68470fb2545adf1bb06900.1591978801.git.gitgitgadget@gmail.com> (view parent)
DKIM signature
missing
Download raw message
"Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct symdiff {
> +	struct bitmap *skip;
> +	int warn;
> +	const char *base, *left, *right;
> +};
> +
> +/*
> + * Check for symmetric-difference arguments, and if present, arrange
> + * everything we need to know to handle them correctly.  As a bonus,
> + * weed out all bogus range-based revision specifications, e.g.,
> + * "git diff A..B C..D" or "git diff A..B C" get rejected.
> + *
> + * For an actual symmetric diff, *symdiff is set this way:
> + *
> + *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
> + *    indices that the caller should ignore (extra merge bases, of
> + *    which there might be many, and A in A...B).  Note that the
> + *    chosen merge base and right side are NOT marked.
> + *  - warn is set if there are multiple merge bases.
> + *  - base, left, and right point to the names to use in a
> + *    warning about multiple merge bases.
> + *
> + * If there is no symmetric diff argument, sym->skip is NULL and
> + * sym->warn is cleared.  The remaining fields are not set.
> + */

OK.

> +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
> +{
> +	int i, is_symdiff = 0, basecount = 0, othercount = 0;
> +	int lpos = -1, rpos = -1, basepos = -1;
> +	struct bitmap *map = NULL;
> +
> +	/*
> +	 * Use the whence fields to find merge bases and left and
> +	 * right parts of symmetric difference, so that we do not
> +	 * depend on the order that revisions are parsed.  If there
> +	 * are any revs that aren't from these sources, we have a
> +	 * "git diff C A...B" or "git diff A...B C" case.  Or we
> +	 * could even get "git diff A...B C...E", for instance.
> +	 *
> +	 * If we don't have just one merge base, we pick one
> +	 * at random.
> +	 *
> +	 * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> +	 * so we must check for SYMMETRIC_LEFT too.  The two arrays
> +	 * rev->pending.objects and rev->cmdline.rev are parallel.
> +	 */
> +	for (i = 0; i < rev->cmdline.nr; i++) {
> +		struct object *obj = rev->pending.objects[i].item;
> +		switch (rev->cmdline.rev[i].whence) {
> +		case REV_CMD_MERGE_BASE:
> +			if (basepos < 0)
> +				basepos = i;
> +			basecount++;
> +			break;		/* do mark all bases */

We find and use the first found merge base (i.e. "pick at random" as
promised in the comment before the function), but for warning, keep
track of how many merge bases there are.

> +		case REV_CMD_LEFT:
> +			if (lpos >= 0)
> +				usage(builtin_diff_usage);

A range (either A..B or A...B) has already been seen, and we have
another, which is now rejected.

> +			lpos = i;
> +			if (obj->flags & SYMMETRIC_LEFT) {
> +				is_symdiff = 1;
> +				break;	/* do mark A */

Inside this switch statement, "continue" is a sign that the caller
should use the rev, and "break" is a sign that the rev is to be
ignored.  We obviously do not ignore "A" in ...

> +			}
> +			continue;

... "A..B" notation, so we "continue" here.

> +		case REV_CMD_RIGHT:
> +			if (rpos >= 0)
> +				usage(builtin_diff_usage);

Here is the same "we reject having two or more ranges".  

I actually suspect that this usage() would become dead code---we
would already have died when we saw the matching left end of the
second range (so this could become BUG(), even though usage() does
not hurt).

> +			rpos = i;
> +			continue;	/* don't mark B */

And of course, whether "A..B" or "A...B", B will be used as the
"result" side of the diff, so won't be marked for skipping.

> +		case REV_CMD_PARENTS_ONLY:
> +		case REV_CMD_REF:
> +		case REV_CMD_REV:
> +			othercount++;
> +			continue;

I wonder if we want to use "default" instead of these three
individual cases.  Pros and cons?

 - If we forgot to list a whence REV_CMD_* here, it will be silently
   marked to be skipped with this code.  With "default", it will be
   counted to be diffed (which may trigger "giving too many revs to
   be diffed" error from the diff machinery, which is good).

 - With "default", when we add new type of whence to REV_CMD_* and
   forget to adjust this code, it will be counted to be diffed.
   With the current code, it will be skipped.

We probably could get the best of the both words by keeping the
above three for "counted in othercount and kept), and then add a
default arm to the switch() that just says 

		default:
			BUG("forgot to handle %d",
			    rev->cmdline.rev[i].whence);

That way, every time we add a new type of whence, we would be forced
to think what should be done to them.

> +		}
> +		if (map == NULL)
> +			map = bitmap_new();
> +		bitmap_set(map, i);
> +	}
> +
> +	/*
> +	 * Forbid any additional revs for both A...B and A..B.
> +	 */
> +	if (lpos >= 0 && othercount > 0)
> +		usage(builtin_diff_usage);

Meaning "git diff A..B C" is bad.  Reasonable.

> +	if (!is_symdiff) {
> +		bitmap_free(map);

It is not wrong per-se to free it unconditionally, but wouldn't it
be a bug if (map != NULL) at this point in the flow?

The merge bases would only be stuffed in the revs when A...B is
given, and we are not skipping anything involved in A..B or
non-range revs.

> +		sym->warn = 0;
> +		sym->skip = NULL;

Clearing these two fields are as promised to the callers in the
comment above, which is good.

> +		return;
> +	}
> +
> +	sym->left = rev->pending.objects[lpos].name;
> +	sym->right = rev->pending.objects[rpos].name;
> +	sym->base = rev->pending.objects[basepos].name;
> +	if (basecount == 0)
> +		die(_("%s...%s: no merge base"), sym->left, sym->right);

Good.

> +	bitmap_unset(map, basepos);	/* unmark the base we want */

Hmph.  You could

	case REV_CMD_MERGE_BASE:
		basecount++;
		if (basepos < 0) {
			basepos = i;
			continue; /* keep this one */
		}
		break; /* skip all others */

and lose this unset().  I do not think it makes too much of a
difference, but it probably is easier to follow if we avoided this
"do something and then come back to correct" pattern.

> +	sym->warn = basecount > 1;
> +	sym->skip = map;
> +}
> +
>  int cmd_diff(int argc, const char **argv, const char *prefix)
>  {
>  	int i;
> @@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  	struct object_array_entry *blob[2];
>  	int nongit = 0, no_index = 0;
>  	int result = 0;
> +	struct symdiff sdiff;
>  
>  	/*
>  	 * We could get N tree-ish in the rev.pending_objects list.
> @@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> +	symdiff_prepare(&rev, &sdiff);
>  	for (i = 0; i < rev.pending.nr; i++) {
>  		struct object_array_entry *entry = &rev.pending.objects[i];
>  		struct object *obj = entry->item;
> @@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  			obj = &get_commit_tree(((struct commit *)obj))->object;
>  
>  		if (obj->type == OBJ_TREE) {
> +			if (sdiff.skip && bitmap_get(sdiff.skip, i))
> +				continue;

By the way, I cannot shake this feeling that, given that
rev.pending/cmdline.nr will not be an unreasonably large number, if
it is overkill to use the bitmap here.  If I were writing this code,
I would have made symdiff_prepare() to fill a separate object array
by copying the elements to be used in the final "diff" out of the
rev.pending array and updated this loop to iterate over that array.

I am not saying that such an approach is better than the use of
bitmap code here.  It just was a bit unexpected to see the bitmap
code used for set of objects that is typically less than a dozen.

Thanks.

Re: [PATCH v4 2/3] git diff: improve range handling

Chris Torek
Details
Message ID
<CAPx1Gvd8Ggt3OwxAcQY3NsxY=ypOk4S=8TdhjzsohKbbyWNGyQ@mail.gmail.com>
In-Reply-To
<xmqq36702l1d.fsf@gitster.c.googlers.com> (view parent)
DKIM signature
missing
Download raw message
Ugh, forgot to tweak gmail reply to go to the mailing list.  Also
I typed in a wrong word ("commit" should be "comment").

Corrected reply:

On Fri, Jun 12, 2020 at 11:31 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Chris Torek via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +struct symdiff {
> > +     struct bitmap *skip;
> > +     int warn;
> > +     const char *base, *left, *right;
> > +};
> > +
> > +/*
> > + * Check for symmetric-difference arguments, and if present, arrange
> > + * everything we need to know to handle them correctly.  As a bonus,
> > + * weed out all bogus range-based revision specifications, e.g.,
> > + * "git diff A..B C..D" or "git diff A..B C" get rejected.
> > + *
> > + * For an actual symmetric diff, *symdiff is set this way:
> > + *
> > + *  - its skip is non-NULL and marks *all* rev->pending.objects[i]
> > + *    indices that the caller should ignore (extra merge bases, of
> > + *    which there might be many, and A in A...B).  Note that the
> > + *    chosen merge base and right side are NOT marked.
> > + *  - warn is set if there are multiple merge bases.
> > + *  - base, left, and right point to the names to use in a
> > + *    warning about multiple merge bases.
> > + *
> > + * If there is no symmetric diff argument, sym->skip is NULL and
> > + * sym->warn is cleared.  The remaining fields are not set.
> > + */
>
> OK.
>
> > +static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
> > +{
> > +     int i, is_symdiff = 0, basecount = 0, othercount = 0;
> > +     int lpos = -1, rpos = -1, basepos = -1;
> > +     struct bitmap *map = NULL;
> > +
> > +     /*
> > +      * Use the whence fields to find merge bases and left and
> > +      * right parts of symmetric difference, so that we do not
> > +      * depend on the order that revisions are parsed.  If there
> > +      * are any revs that aren't from these sources, we have a
> > +      * "git diff C A...B" or "git diff A...B C" case.  Or we
> > +      * could even get "git diff A...B C...E", for instance.
> > +      *
> > +      * If we don't have just one merge base, we pick one
> > +      * at random.
> > +      *
> > +      * NB: REV_CMD_LEFT, REV_CMD_RIGHT are also used for A..B,
> > +      * so we must check for SYMMETRIC_LEFT too.  The two arrays
> > +      * rev->pending.objects and rev->cmdline.rev are parallel.
> > +      */
> > +     for (i = 0; i < rev->cmdline.nr; i++) {
> > +             struct object *obj = rev->pending.objects[i].item;
> > +             switch (rev->cmdline.rev[i].whence) {
> > +             case REV_CMD_MERGE_BASE:
> > +                     if (basepos < 0)
> > +                             basepos = i;
> > +                     basecount++;
> > +                     break;          /* do mark all bases */
>
> We find and use the first found merge base (i.e. "pick at random" as
> promised in the comment before the function), but for warning, keep
> track of how many merge bases there are.
>
> > +             case REV_CMD_LEFT:
> > +                     if (lpos >= 0)
> > +                             usage(builtin_diff_usage);
>
> A range (either A..B or A...B) has already been seen, and we have
> another, which is now rejected.
>
> > +                     lpos = i;
> > +                     if (obj->flags & SYMMETRIC_LEFT) {
> > +                             is_symdiff = 1;
> > +                             break;  /* do mark A */
>
> Inside this switch statement, "continue" is a sign that the caller
> should use the rev, and "break" is a sign that the rev is to be
> ignored.  We obviously do not ignore "A" in ...
>
> > +                     }
> > +                     continue;
>
> ... "A..B" notation, so we "continue" here.
>
> > +             case REV_CMD_RIGHT:
> > +                     if (rpos >= 0)
> > +                             usage(builtin_diff_usage);
>
> Here is the same "we reject having two or more ranges".
>
> I actually suspect that this usage() would become dead code---we
> would already have died when we saw the matching left end of the
> second range (so this could become BUG(), even though usage() does
> not hurt).

Right - I considered it both ways and figured a second usage() was
simpler and straightforward, but I'm fine with either method.

> > +                     rpos = i;
> > +                     continue;       /* don't mark B */
>
> And of course, whether "A..B" or "A...B", B will be used as the
> "result" side of the diff, so won't be marked for skipping.
>
> > +             case REV_CMD_PARENTS_ONLY:
> > +             case REV_CMD_REF:
> > +             case REV_CMD_REV:
> > +                     othercount++;
> > +                     continue;
>
> I wonder if we want to use "default" instead of these three
> individual cases.  Pros and cons?
>
>  - If we forgot to list a whence REV_CMD_* here, it will be silently
>    marked to be skipped with this code.  With "default", it will be
>    counted to be diffed (which may trigger "giving too many revs to
>    be diffed" error from the diff machinery, which is good).
>
>  - With "default", when we add new type of whence to REV_CMD_* and
>    forget to adjust this code, it will be counted to be diffed.
>    With the current code, it will be skipped.
>
> We probably could get the best of the both words by keeping the
> above three for "counted in othercount and kept), and then add a
> default arm to the switch() that just says
>
>                 default:
>                         BUG("forgot to handle %d",
>                             rev->cmdline.rev[i].whence);

I'm fine with this as well.  I did it the way I did because at least some
compilers give a warning or error if you forgot an enum.  Using default
(or adding one) defeats this, but the BUG method is reasonable.

> That way, every time we add a new type of whence, we would be forced
> to think what should be done to them.
>
> > +             }
> > +             if (map == NULL)
> > +                     map = bitmap_new();
> > +             bitmap_set(map, i);
> > +     }
> > +
> > +     /*
> > +      * Forbid any additional revs for both A...B and A..B.
> > +      */
> > +     if (lpos >= 0 && othercount > 0)
> > +             usage(builtin_diff_usage);
>
> Meaning "git diff A..B C" is bad.  Reasonable.
>
> > +     if (!is_symdiff) {
> > +             bitmap_free(map);
>
> It is not wrong per-se to free it unconditionally, but wouldn't it
> be a bug if (map != NULL) at this point in the flow?

Yes, because the A in A...B will have been marked.  It didn't
seem worth a BUG() call though.

> The merge bases would only be stuffed in the revs when A...B is
> given, and we are not skipping anything involved in A..B or
> non-range revs.
>
> > +             sym->warn = 0;
> > +             sym->skip = NULL;
>
> Clearing these two fields are as promised to the callers in the
> comment above, which is good.
>
> > +             return;
> > +     }
> > +
> > +     sym->left = rev->pending.objects[lpos].name;
> > +     sym->right = rev->pending.objects[rpos].name;
> > +     sym->base = rev->pending.objects[basepos].name;
> > +     if (basecount == 0)
> > +             die(_("%s...%s: no merge base"), sym->left, sym->right);
>
> Good.
>
> > +     bitmap_unset(map, basepos);     /* unmark the base we want */
>
> Hmph.  You could
>
>         case REV_CMD_MERGE_BASE:
>                 basecount++;
>                 if (basepos < 0) {
>                         basepos = i;
>                         continue; /* keep this one */
>                 }
>                 break; /* skip all others */
>
> and lose this unset().  I do not think it makes too much of a
> difference, but it probably is easier to follow if we avoided this
> "do something and then come back to correct" pattern.

OK, I'll change it to do this in the re-roll.

> > +     sym->warn = basecount > 1;
> > +     sym->skip = map;
> > +}
> > +
> >  int cmd_diff(int argc, const char **argv, const char *prefix)
> >  {
> >       int i;
> > @@ -263,6 +366,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> >       struct object_array_entry *blob[2];
> >       int nongit = 0, no_index = 0;
> >       int result = 0;
> > +     struct symdiff sdiff;
> >
> >       /*
> >        * We could get N tree-ish in the rev.pending_objects list.
> > @@ -382,6 +486,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> >               }
> >       }
> >
> > +     symdiff_prepare(&rev, &sdiff);
> >       for (i = 0; i < rev.pending.nr; i++) {
> >               struct object_array_entry *entry = &rev.pending.objects[i];
> >               struct object *obj = entry->item;
> > @@ -396,6 +501,8 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
> >                       obj = &get_commit_tree(((struct commit *)obj))->object;
> >
> >               if (obj->type == OBJ_TREE) {
> > +                     if (sdiff.skip && bitmap_get(sdiff.skip, i))
> > +                             continue;
>
> By the way, I cannot shake this feeling that, given that
> rev.pending/cmdline.nr will not be an unreasonably large number, if
> it is overkill to use the bitmap here.  If I were writing this code,
> I would have made symdiff_prepare() to fill a separate object array
> by copying the elements to be used in the final "diff" out of the
> rev.pending array and updated this loop to iterate over that array.
>
> I am not saying that such an approach is better than the use of
> bitmap code here.  It just was a bit unexpected to see the bitmap
> code used for set of objects that is typically less than a dozen.

I am sure it is overkill -- but at the same time, it's already coded
and cheap enough to use that rolling our own separate array felt
worse.  I can add a comment (NOT COMMIT) if you like.

Thanks,

Chris
Reply to thread Export thread (mbox)