Re: [PATCH] has_uncommitted_changes(): fall back to empty tree

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 11 Jul 2018 16:41:02 +0200

On Wed, Jul 11 2018, Jeff King wrote:

> On Wed, Jul 11, 2018 at 09:34:02AM -0400, Jeff King wrote:
>
>> I do worry that other callers of run_diff_index() might have similar
>> problems, though. Grepping around, the other callers seem to fall into
>> one of three categories:
>>
>>  - they resolve the object themselves and put it in the pending list
>>    (and often fallback to the empty tree, which is more or less what the
>>    patch above is doing)
>>
>>  - they resolve the object themselves and avoid calling run_diff_index()
>>    if it's not valid
>>
>>  - they use setup_revisions(), which will barf on the broken object
>>
>> So I think this may be sufficient. We probably should also add an
>> assertion to run_diff_index(), since that's better than segfaulting.
>
> Here's a patch to do that. I tweaked it slightly from what I showed
> earlier to use the empty tree, which matches what other code (e.g.,
> git-diff) would do.
>
> -- >8 --
> Subject: has_uncommitted_changes(): fall back to empty tree
>
> If has_uncommitted_changes() can't resolve HEAD (e.g.,
> because it's unborn or corrupt), then we end up calling
> run_diff_index() with an empty revs.pending array. This
> causes a segfault, as run_diff_index() blindly looks at the
> first pending item.
>
> Fixing this raises a question of fault: should
> run_diff_index() handle this case, or is the caller wrong to
> pass an empty pending list?
>
> Looking at the other callers of run_diff_index(), they
> handle this in one of three ways:
>
>  - they resolve the object themselves, and avoid doing the
>    diff if it's not valid
>
>  - they resolve the object themselves, and fall back to the
>    empty tree
>
>  - they use setup_revisions(), which will die() if the
>    object isn't valid
>
> Since this is the only broken caller, that argues that the
> fix should go there. Falling back to the empty tree makes
> sense here, as we'd claim uncommitted changes if and only if
> the index is non-empty. This may be a little funny in the
> case of corruption (the corrupt HEAD probably _isn't_
> empty), but:
>
>   - we don't actually know the reason here that HEAD didn't
>     resolve (the much more likely case is that we have an
>     unborn HEAD, in which case the empty tree comparison is
>     the right thing)
>
>   - this matches how other code, like "git diff", behaves
>
> While we're thinking about it, let's add an assertion to
> run_diff_index(). It should always be passed a single
> object, and as this bug shows, it's easy to get it wrong
> (and an assertion is easier to hunt down than a segfault, or
> a quietly ignored extra tree).
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  diff-lib.c      |  3 +++
>  t/t5520-pull.sh | 12 ++++++++++++
>  wt-status.c     | 10 ++++++++++
>  3 files changed, 25 insertions(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index a9f38eb5a3..732f684a49 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -520,6 +520,9 @@ int run_diff_index(struct rev_info *revs, int cached)
>  	struct object_array_entry *ent;
>  	uint64_t start = getnanotime();
>
> +	if (revs->pending.nr != 1)
> +		BUG("run_diff_index must be passed exactly one tree");
> +
>  	ent = revs->pending.objects;
>  	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
>  		exit(128);
> diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
> index 59c4b778d3..68aa5f0340 100755
> --- a/t/t5520-pull.sh
> +++ b/t/t5520-pull.sh
> @@ -618,6 +618,18 @@ test_expect_success 'pull --rebase fails on unborn branch with staged changes' '
>  	)
>  '
>
> +test_expect_success 'pull --rebase fails on corrupt HEAD' '
> +	test_when_finished "rm -rf corrupt" &&
> +	git init corrupt &&
> +	(
> +		cd corrupt &&
> +		test_commit one &&
> +		obj=$(git rev-parse --verify HEAD | sed "s#^..#&/#") &&
> +		rm -f .git/objects/$obj &&
> +		test_must_fail git pull --rebase
> +	)
> +'
> +
>  test_expect_success 'setup for detecting upstreamed changes' '
>  	mkdir src &&
>  	(cd src &&
> diff --git a/wt-status.c b/wt-status.c
> index d1c05145a4..d89c41ba10 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -2340,7 +2340,17 @@ int has_uncommitted_changes(int ignore_submodules)
>  	if (ignore_submodules)
>  		rev_info.diffopt.flags.ignore_submodules = 1;
>  	rev_info.diffopt.flags.quick = 1;
> +
>  	add_head_to_pending(&rev_info);
> +	if (!rev_info.pending.nr) {
> +		/*
> +		 * We have no head (or it's corrupt); use the empty tree,
> +		 * which will complain if the index is non-empty.
> +		 */
> +		struct tree *tree = lookup_tree(the_hash_algo->empty_tree);
> +		add_pending_object(&rev_info, &tree->object, "");
> +	}
> +
>  	diff_setup_done(&rev_info.diffopt);
>  	result = run_diff_index(&rev_info, 1);
>  	return diff_result_code(&rev_info.diffopt, result);

Thanks a lot. I've tested this and it looks good to me and fixes the
bug.

> [From upthread]: It took me a minute to reproduce this. It needs "pull
> --rebase" if you don't have that setup in your config.

Yeah, sorry about that. I tested without --rebase while simplifying the
testcase, but forgot that I have pull.rebase=true in my ~/.gitconfig, so
of course that didn't imply --no-rebase.