Re: [PATCH v3 1/7] remote: add reflog check for "force-if-includes"

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 16 Sep 2020 14:35:04 +0200 (CEST)

Hi Srinidhi,

On Sun, 13 Sep 2020, Srinidhi Kaushik wrote:

> diff --git a/remote.c b/remote.c
> index 420150837b..e4b2d85a6f 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -1484,6 +1484,36 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>  				force_ref_update = 1;
>  		}
>
> +		/*
> +		 * If the tip of the remote-tracking ref is unreachable
> +		 * from any reflog entry of its local ref indicating a
> +		 * possible update since checkout; reject the push.
> +		 *
> +		 * There is no need to check for reachability, if the
> +		 * ref is marked for deletion.
> +		 */
> +		if (ref->if_includes && !ref->deletion) {
> +			/*
> +			 * If `force_ref_update' was previously set by
> +			 * "compare-and-swap", and we have to run this
> +			 * check, reset it back to the original value
> +			 * and update it depending on the status of this
> +			 * check.
> +			 */
> +			force_ref_update = ref->force || force_update;
> +
> +			if (ref->unreachable)
> +				reject_reason =
> +					REF_STATUS_REJECT_REMOTE_UPDATED;
> +			else
> +				/*
> +				 * If updates from the remote-tracking ref
> +				 * have been integrated locally; force the
> +				 * update.
> +				 */
> +				force_ref_update = 1;
> +		}
> +
>  		/*
>  		 * If the update isn't already rejected then check
>  		 * the usual "must fast-forward" rules.
> @@ -2272,11 +2302,74 @@ static int remote_tracking(struct remote *remote, const char *refname,
>  	return 0;
>  }
>
> +static int ref_reachable(struct object_id *o_oid, struct object_id *n_oid,
> +			 const char *ident, timestamp_t timestamp, int tz,
> +			 const char *message, void *cb_data)
> +{
> +	int ret = 0;
> +	struct object_id *r_oid = cb_data;
> +
> +	ret = oideq(n_oid, r_oid);
> +	if (!ret) {

Rather than having the largest part of the actual code statements in this
function indented, it would make more sense to write

	if (oideq(n_oid, r_oid))
		return 1;

	if (!(local = lookup...) ||
	    !(remote = lookup...))
		return 0;

	return in_merge_bases(remote, local);

> +		struct commit *loc = lookup_commit_reference(the_repository,
> +							     n_oid);
> +		struct commit *rem = lookup_commit_reference(the_repository,
> +							     r_oid);
> +		ret = (loc && rem) ? in_merge_bases(rem, loc) : 0;
> +	}

This chooses the strategy of iterating over the reflog just once, and at
every step first testing whether the respective reflog entry is identical
to the remote-tracking branch tip. Only when they are different do we test
whether the remote-tracking branch tip is at least reachable from the
reflog entry.

Let's assume that our local branch has 20 reflog entries, and the 4th one
is identical to the current tip of the remote-tracking branch. Then we
tested reachability 3 times. But that test is rather expensive.

Therefore, I would have preferred to have a call to
`for_each_reflog_ent_reverse()` with a callback function that only returns
the `oideq()` result, and only if the return value of that call is 0, I
would have wanted to see another call to `for_each_reflog_ent_reverse()`
to go through, this time looking for reachability.

> +
> +	return ret;
> +}
> +
> +/*
> + * Iterate through the reflog of a local branch and check
> + * if the tip of the remote-tracking branch is reachable
> + * from one of the entries.
> + */
> +static int ref_reachable_from_reflog(const struct object_id *r_oid,
> +				     const struct object_id *l_oid,
> +				     const char *local_ref_name)
> +{
> +	int ret = 0;
> +	struct commit *r_commit, *l_commit;
> +
> +	l_commit = lookup_commit_reference(the_repository, l_oid);
> +	r_commit = lookup_commit_reference(the_repository, r_oid);

At this point, we already LOOked up `r_commit`. But we don't pass that to
`ref_reachable()` at any point (instead passing only `r_oid`), so we have
to perform the lookup again.

That's wasteful. Shouldn't we pass `r_commit` directly?

With the two-pass strategy I outlined above, the first pass would use
`r_oid`, and only when the second pass is necessary would we resort to
calling the expensive reachability check.

> +
> +	/*
> +	 * If the remote-tracking ref is an ancestor of the local
> +	 * ref (a merge, for instance) there is no need to iterate
> +	 * through the reflog entries to ensure reachability; it
> +	 * can be skipped to return early instead.
> +	 */
> +	ret = (r_commit && l_commit) ? in_merge_bases(r_commit, l_commit) : 0;

Correct me if I am wrong, but isn't the first reflog entry
(`<remote-branch>@{0}`) identical to `r_commit`? In that case, the first
iteration of the second pass over the reflog would trivially perform this
check, and we do not need to duplicate the logic here.

> +	if (!ret)
> +		ret = for_each_reflog_ent_reverse(local_ref_name, ref_reachable,
> +						  (struct object_id *)r_oid);
> +
> +	return ret;
> +}
> +
> +/*
> + * Check for reachability of a remote-tracking
> + * ref in the reflog entries of its local ref.
> + */
> +void check_reflog_for_ref(struct ref *r_ref)
> +{
> +	struct object_id r_oid;
> +	struct ref *l_ref = get_local_ref(r_ref->name);
> +
> +	if (r_ref->if_includes && l_ref && !read_ref(l_ref->name, &r_oid))

If `r_ref->if_includes` is 0, we do not even have to get the local ref,
correct? It would make `check_reflog_for_ref()` much easier to read for me
if it was only called when that flag was already verified to be 1, and
then followed this structure:

	if (!l_ref)
		return;
	if (read_ref(...))
		warning(_("ignoring stale remote branch information ..."));
	else
		r_ref->unreachable = ...

Also, it might make a lot more sense to rename `check_reflog_for_ref()` to
`check_if_includes_upstream()`, and to rename `r_ref` to `local` and
`l_ref` to `remote_tracking` or something like that: nothing is inherently
"left" or "right" about those refs.

> +		r_ref->unreachable = !ref_reachable_from_reflog(&r_ref->old_oid,
> +								&r_oid,
> +								l_ref->name);
> +}
> +
>  static void apply_cas(struct push_cas_option *cas,
>  		      struct remote *remote,
>  		      struct ref *ref)
>  {
> -	int i;
> +	int i, is_tracking = 0;
>
>  	/* Find an explicit --<option>=<name>[:<value>] entry */
>  	for (i = 0; i < cas->nr; i++) {
> @@ -2288,16 +2381,26 @@ static void apply_cas(struct push_cas_option *cas,
>  			oidcpy(&ref->old_oid_expect, &entry->expect);
>  		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
>  			oidclr(&ref->old_oid_expect);
> -		return;
> +		else
> +			is_tracking = 1;

As part of `remote_tracking()`, we already looked up the branch name.
Since we need it in the `is_tracking` case, maybe this should not be a
Boolean anymore but store a copy of the remote-tracking branch name
instead?

Oh, following the code path all the way down to
`match_name_with_pattern()`, it seems that `remote_tracking()`'s `dst`
variable _already_ contains a copy (which means that that memory is
leaked, right?).

> +		break;
>  	}
>
>  	/* Are we using "--<option>" to cover all? */
> -	if (!cas->use_tracking_for_rest)
> -		return;
> +	if (cas->use_tracking_for_rest) {
> +		ref->expect_old_sha1 = 1;
> +		if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
> +			oidclr(&ref->old_oid_expect);
> +		else
> +			is_tracking = 1;
> +	}
> +
> +	/*
> +	 * Mark this ref to be checked if "--force-if-includes" is
> +	 * specified as an argument along with "compare-and-swap".
> +	 */
> +	ref->is_tracking = is_tracking;
>
> -	ref->expect_old_sha1 = 1;
> -	if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
> -		oidclr(&ref->old_oid_expect);
>  }
>
>  void apply_push_cas(struct push_cas_option *cas,
> @@ -2308,3 +2411,21 @@ void apply_push_cas(struct push_cas_option *cas,
>  	for (ref = remote_refs; ref; ref = ref->next)
>  		apply_cas(cas, remote, ref);
>  }
> +
> +void apply_push_force_if_includes(struct ref *remote_refs, int used_with_cas)
> +{
> +	struct ref *ref;
> +	for (ref = remote_refs; ref; ref = ref->next) {
> +		/*
> +		 * If "compare-and-swap" is used along with option, run the
> +		 * check on refs that have been marked to do so. Otherwise,
> +		 * all refs will be checked.
> +		 */
> +		if (used_with_cas)
> +			ref->if_includes = ref->is_tracking;
> +		else
> +			ref->if_includes = 1;
> +
> +		check_reflog_for_ref(ref);
> +	}
> +}
> diff --git a/remote.h b/remote.h
> index 5e3ea5a26d..1618ba892b 100644
> --- a/remote.h
> +++ b/remote.h
> @@ -104,7 +104,10 @@ struct ref {
>  		forced_update:1,
>  		expect_old_sha1:1,
>  		exact_oid:1,
> -		deletion:1;
> +		deletion:1,
> +		if_includes:1, /* If "--force-with-includes" was specified.  */
> +		is_tracking:1, /* If "use_tracking[_for_rest]" is set (CAS). */
> +		unreachable:1; /* For "if_includes"; unreachable in reflog.  */
>
>  	enum {
>  		REF_NOT_MATCHED = 0, /* initial value */
> @@ -134,6 +137,7 @@ struct ref {
>  		REF_STATUS_REJECT_NEEDS_FORCE,
>  		REF_STATUS_REJECT_STALE,
>  		REF_STATUS_REJECT_SHALLOW,
> +		REF_STATUS_REJECT_REMOTE_UPDATED,
>  		REF_STATUS_UPTODATE,
>  		REF_STATUS_REMOTE_REJECT,
>  		REF_STATUS_EXPECTING_REPORT,
> @@ -346,4 +350,12 @@ int parseopt_push_cas_option(const struct option *, const char *arg, int unset);
>  int is_empty_cas(const struct push_cas_option *);
>  void apply_push_cas(struct push_cas_option *, struct remote *, struct ref *);
>
> +/*
> + * Runs when "--force-if-includes" is specified.
> + * Checks if the remote-tracking ref was updated (since checkout)
> + * implicitly in the background and verify that changes from the
> + * updated tip have been integrated locally, before pushing.
> + */
> +void apply_push_force_if_includes(struct ref*, int);

This function is not even hooked up in this patch, right? I don't think
that it makes sense to introduce it without a caller, in particular since
it makes it harder to guess what those parameters might be used for.

In general, it appears to me as if the code worked way too hard to
accomplish something that should be a lot simpler: when
`--force-if-includes` is passed, it should piggy-back on top of the
`--force-with-lease` code path, and just add yet another check on top.

With that in mind, I would have expected something more in line with this:

-- snip --
 struct push_cas_option {
 	unsigned use_tracking_for_rest:1;
 	struct push_cas {
 		struct object_id expect;
-		unsigned use_tracking:1;
+		enum {
+			PUSH_NO_CAS = 0,
+			PUSH_CAS_USE_TRACKING,
+			PUSH_CAS_IF_INCLUDED
+		} mode;
 		char *refname;
 	} *entry;
 	int nr;
 	int alloc;
 };
-- snap --

and then adjusting the respective code paths accordingly.

Ciao,
Dscho

> +
>  #endif
> --
> 2.28.0
>
>