Re: [PATCH v3 1/7] remote: add reflog check for "force-if-includes"

Srinidhi Kaushik <shrinidhi.kaushik@xxxxxxxxx> · Sat, 19 Sep 2020 22:31:14 +0530

Hi Johannes,

On 09/16/2020 14:35, Johannes Schindelin wrote:

> > [...]
> > +		struct commit *loc = lookup_commit_reference(the_repository,
> > +							     n_oid);
> > +		struct commit *rem = lookup_commit_reference(the_repository,
> > +							     r_oid);
> > +		ret = (loc && rem) ? in_merge_bases(rem, loc) : 0;
> > +	}
> 
> This chooses the strategy of iterating over the reflog just once, and at
> every step first testing whether the respective reflog entry is identical
> to the remote-tracking branch tip. Only when they are different do we test
> whether the remote-tracking branch tip is at least reachable from the
> reflog entry.
> 
> Let's assume that our local branch has 20 reflog entries, and the 4th one
> is identical to the current tip of the remote-tracking branch. Then we
> tested reachability 3 times. But that test is rather expensive.
> 
> Therefore, I would have preferred to have a call to
> `for_each_reflog_ent_reverse()` with a callback function that only returns
> the `oideq()` result, and only if the return value of that call is 0, I
> would have wanted to see another call to `for_each_reflog_ent_reverse()`
> to go through, this time looking for reachability.

OK, you're right about this. Will update check to use the two-step
approach.

> > [...] 
> > +	int ret = 0;
> > +	struct commit *r_commit, *l_commit;
> > +
> > +	l_commit = lookup_commit_reference(the_repository, l_oid);
> > +	r_commit = lookup_commit_reference(the_repository, r_oid);
> 
> At this point, we already LOOked up `r_commit`. But we don't pass that to
> `ref_reachable()` at any point (instead passing only `r_oid`), so we have
> to perform the lookup again.
> 
> That's wasteful. Shouldn't we pass `r_commit` directly?

Hmm, I suppose we can. Not sure why I did that in the first place. 

> With the two-pass strategy I outlined above, the first pass would use
> `r_oid`, and only when the second pass is necessary would we resort to
> calling the expensive reachability check.
> 
> > +
> > +	/*
> > +	 * If the remote-tracking ref is an ancestor of the local
> > +	 * ref (a merge, for instance) there is no need to iterate
> > +	 * through the reflog entries to ensure reachability; it
> > +	 * can be skipped to return early instead.
> > +	 */
> > +	ret = (r_commit && l_commit) ? in_merge_bases(r_commit, l_commit) : 0;
> 
> Correct me if I am wrong, but isn't the first reflog entry
> (`<remote-branch>@{0}`) identical to `r_commit`? In that case, the first
> iteration of the second pass over the reflog would trivially perform this
> check, and we do not need to duplicate the logic here.

That's a correct assumption; the second pass will check for this.

> > +	if (!ret)
> > +		ret = for_each_reflog_ent_reverse(local_ref_name, ref_reachable,
> > +						  (struct object_id *)r_oid);
> > +
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Check for reachability of a remote-tracking
> > + * ref in the reflog entries of its local ref.
> > + */
> > +void check_reflog_for_ref(struct ref *r_ref)
> > +{
> > +	struct object_id r_oid;
> > +	struct ref *l_ref = get_local_ref(r_ref->name);
> > +
> > +	if (r_ref->if_includes && l_ref && !read_ref(l_ref->name, &r_oid))
> 
> If `r_ref->if_includes` is 0, we do not even have to get the local ref,
> correct? It would make `check_reflog_for_ref()` much easier to read for me
> if it was only called when that flag was already verified to be 1, and
> then followed this structure:
> 
> 	if (!l_ref)
> 		return;
> 	if (read_ref(...))
> 		warning(_("ignoring stale remote branch information ..."));
> 	else
> 		r_ref->unreachable = ...
> 
> Also, it might make a lot more sense to rename `check_reflog_for_ref()` to
> `check_if_includes_upstream()`, and to rename `r_ref` to `local` and
> `l_ref` to `remote_tracking` or something like that: nothing is inherently
> "left" or "right" about those refs.

Now that I think about it, we don't need te call to "read_ref()" at all
because the reflog will have the OID we need for checking. As to
"{l,r}_ref", they were meant to be [l]ocal and [r]emote. :)

> > +		r_ref->unreachable = !ref_reachable_from_reflog(&r_ref->old_oid,
> > +								&r_oid,
> > +								l_ref->name);
> > +}
> > +
> >  static void apply_cas(struct push_cas_option *cas,
> >  		      struct remote *remote,
> >  		      struct ref *ref)
> >  {
> > -	int i;
> > +	int i, is_tracking = 0;
> >
> >  	/* Find an explicit --<option>=<name>[:<value>] entry */
> >  	for (i = 0; i < cas->nr; i++) {
> > @@ -2288,16 +2381,26 @@ static void apply_cas(struct push_cas_option *cas,
> >  			oidcpy(&ref->old_oid_expect, &entry->expect);
> >  		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
> >  			oidclr(&ref->old_oid_expect);
> > -		return;
> > +		else
> > +			is_tracking = 1;
> 
> As part of `remote_tracking()`, we already looked up the branch name.
> Since we need it in the `is_tracking` case, maybe this should not be a
> Boolean anymore but store a copy of the remote-tracking branch name
> instead?
> 
> Oh, following the code path all the way down to
> `match_name_with_pattern()`, it seems that `remote_tracking()`'s `dst`
> variable _already_ contains a copy (which means that that memory is
> leaked, right?).

I'm not sure I understand what you mean. The call to "remote_tracking()"
gets the OID of the remote ref and writes it to "old_oid_expect".
We don't need "dst" from there because "old_oid" is sufficient for
checking the reflog entries. However, calling "get_local_ref()" is
necessary to get the local branch name. Also, why does "is_tracking"
have to be a Boolean? We already have "ref->name", right?

Anyway, I think we can get rid of "is_tracking" altogether and just use
"if_includes".

> > [...] 
> > +/*
> > + * Runs when "--force-if-includes" is specified.
> > + * Checks if the remote-tracking ref was updated (since checkout)
> > + * implicitly in the background and verify that changes from the
> > + * updated tip have been integrated locally, before pushing.
> > + */
> > +void apply_push_force_if_includes(struct ref*, int);
> 
> This function is not even hooked up in this patch, right? I don't think
> that it makes sense to introduce it without a caller, in particular since
> it makes it harder to guess what those parameters might be used for.
> 
> In general, it appears to me as if the code worked way too hard to
> accomplish something that should be a lot simpler: when
> `--force-if-includes` is passed, it should piggy-back on top of the
> `--force-with-lease` code path, and just add yet another check on top.

Well, it isn't included in this commit, because it was called from
"transport.c" and "send-pack.c", I decided to put that in another
commit. I will fix that in the next patch series.

> With that in mind, I would have expected something more in line with this:
> 
> -- snip --
>  struct push_cas_option {
>  	unsigned use_tracking_for_rest:1;
>  	struct push_cas {
>  		struct object_id expect;
> -		unsigned use_tracking:1;
> +		enum {
> +			PUSH_NO_CAS = 0,
> +			PUSH_CAS_USE_TRACKING,
> +			PUSH_CAS_IF_INCLUDED
> +		} mode;
>  		char *refname;
>  	} *entry;
>  	int nr;
>  	int alloc;
>  };
> -- snap --
> 
> and then adjusting the respective code paths accordingly.

OK, I will clean it up and get rid of the redundant/wasteful
calls and data being passed around. Regarding the new "enum",
I feel that just having a new bit-field "use_force_if_includes"
might be sufficient for "push_cas_option". The idea is to set it
to "1" when "--force-if-includes" is specified along with
"--force-with-lease", and "apply_push_cas()" will run the check
and set "ref->unreachable" depending on the check status.
To clarify, you would have to specify something like this to
enable the check:

   git push --force-if-includes --force-with-lease=master [...]

Then, having a configuration setting "push.useForceIfIncludes" would
be the same as running "git-push" like mentioned above. This new option
will be a "no-op" if specified otherwise. Would that be acceptable?

Thanks again, for a thorough review. 
-- 
Srinidhi Kaushik