Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 13 Sep 2016 15:34:53 -0700

Ben Peart <peartben@xxxxxxxxx> writes:

> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> +	const struct branch_info *old,
> +	const struct branch_info *new)
> +{
> +...
> +}

I do not think I need to repeat the same remarks on the conditions
in this helper, which hasn't changed since v2.  Many "comments" in
the code do not explain why skipping is justified, or what they
claim to check looks to me just plain wrong.

For example, there is

       /*
        * If we're not creating a new branch, by definition we're changing
        * the existing one so need to do the merge
        */
       if (!opts->new_branch)
               return 1;
	
but "git checkout" (no other argument) hits this condition.  It
disables the most trivial optimization opportunity, because we are
not "creating".

"By definition, we're changing"?  Really?  Not quite.

If you disable this bogus check, "git checkout" (no other argument)
would be allowed to skip the merge_working_tree(), and that in turn
reveals another case that the helper is not checking when
unpack_trees() MUST be called.

    Note: namely, when sparse checkout is in effect, switching from
    HEAD to HEAD can nuke existing working tree files outside the
    sparse pattern -- YUCK!  See penultimate test in t1011 for
    an example.

This yuckiness is not your fault, but needs_working_tree_merge()
logic you added needs to refrain from skipping unpack_trees() call
when sparse thing is in effect.  I'd expect "git checkout -b foo"
instead of "git checkout" (no other argument) would fail to honor
the sparse thing and reveal this bug, because the above bogus
"!opts->new_branch" check will not protect you for that case.

In other words, these random series of "if (...) return 1" are bugs
hiding other real bugs and we need to reason about which ones are
bugs that are hiding what other bugs that are not covered by this
function.  As Peff said earlier for v1, this is still an unreadable
mess.  We need to figure out a way to make sure we are skipping on
the right condition and not accidentally hiding a bug of failing to
check the right condition.  I offhand do not have a good suggestion
on this; sorry.

>  static int merge_working_tree(const struct checkout_opts *opts,
>  			      struct branch_info *old,
>  			      struct branch_info *new,
>  			      int *writeout_error)
>  {
> +	/*
> +	 * Optimize the performance of "git checkout -b foo" by avoiding
> +	 * the expensive merge, index and working directory updates if they
> +	 * are not needed.
> +	 */
> +	if (!needs_working_tree_merge(opts, old, new))
> +		return 0;
> +
>  	int ret;
>  	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));

With the change you made at the beginning of this function, it no
longer compiles with -Wdecl-after-stmt, but that is the smallest of
the problems.

It is a small step in the right direction to move the call to the
helper from the caller to this function, but it is a bit too small.

Notice that the lines after the above context look like this:

	hold_locked_index(lock_file, 1);
	if (read_cache_preload(NULL) < 0)
		return error(_("index file corrupt"));

	resolve_undo_clear();
	if (opts->force) {
		ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
		if (ret)
			return ret;
	} else {
		struct tree_desc trees[2];
		...

I would have expected that the check goes inside the "else" thing
that actually does a two-tree merge, and the helper loses the check
with opts->force, at least.  That would still be a change smaller
than desired, but at least a meaningful improvement compared to the
previous one.  As I have already pointed out, in the "else" clause
there is a check "is the index free of conflicted entries? if so
error out", and that must be honored in !opt->force case, no matter
what your needs_working_tree_merge() says.  I also was hoping that
you would notice, when you were told about the unmerged check, by
reading the remainder of the merge_working_tree(), that we need to
call show_local_changes() when we are not doing force and when we
are not quiet---returning early like the above patch will never be
able to call that one downstream in the function.

Regardless of what the actual checks end up to be, the right place
to do this "optimization" would look more like:

 builtin/checkout.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2b50a49..a6b9e17 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			topts.dir->flags |= DIR_SHOW_IGNORED;
 			setup_standard_excludes(topts.dir);
 		}
+
+		if ( we know we can skip the unpack ) {
+			ret = 0;
+		} else {
 			tree = parse_tree_indirect(old->commit ?
 						   old->commit->object.oid.hash :
 						   EMPTY_TREE_SHA1_BIN);
 			init_tree_desc(&trees[0], tree->buffer, tree->size);
 			tree = parse_tree_indirect(new->commit->object.oid.hash);
 			init_tree_desc(&trees[1], tree->buffer, tree->size);
-
 			ret = unpack_trees(2, trees, &topts);
+		}
+
 		if (ret == -1) {
 			/*
 			 * Unpack couldn't do a trivial merge; either

I'd think.  Note that the determination of "we can skip" would
involve knowing the object names of the two trees involved, so for
performance reasons, some of the parse-tree calls may have to come
before the call to "do we know we can skip?", but that does not
fundamentally change the basic code structure.

Thanks.