Re: [PATCH v4 3/7] update-ref: add support for 'symref-verify' command

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 26 Apr 2024 15:51:06 -0700

Karthik Nayak <karthik.188@xxxxxxxxx> writes:

> From: Karthik Nayak <karthik.188@xxxxxxxxx>
>
> In the previous commits, we added the required base for adding symref
> commands to the '--stdin' mode provided by 'git-update-ref(1)'. Using
> them, add a new 'symref-verify' command to verify symrefs.
>
> The 'symref-verify' command allows users to verify if a provided <ref>
> contains the provided <old-target> without changing the <ref>. If
> <old-target> is not provided, the command will verify that the <ref>
> doesn't exist. Since we're checking for symbolic refs, this command will
> only work with the 'no-deref' mode. This is because any dereferenced
> symbolic ref will point to an object and not a ref and the regular
> 'verify' command can be used in such situations.

All makes sense, but a naïve reader may find it helpful if you
explained why having "verify" command is a good idea in the first
place ("I can just do 'git symoblic-ref' to read the current value,
and see if it is what I expect").  Presumably the value of "verify"
is that you can have it in a transaction and fail other operations
in the same transaction if the symref moved from what you expected
it to point at?

> Add and use `ref_update_is_null_new_value`, a helper function which is
> used to check if there is a new_value in a reference update. The new
> value could either be a symref target `new_target` or a OID `new_oid`.
> We also add tests to test the command in both the regular stdin mode and
> also with the '-z' flag.

This looks out of place, primarily because the helper function is
*NOT* used in this step.  Without any actual user, and with the name
that says only what it checks without hinting why a caller may want
to check the condition it checks, it is hard to guess if it is a
good idea to have such a helper.

"If a ref_update object specifies no new-oid and no new-target, it
is not about updating but just validating" is how the callers are
expected to use it, then instead of is_null_new_value that says
what it checks, something like is_verify_only that says what the
caller may want to use it for would be a more friendly name for
readers and future developers.

> @@ -297,11 +320,47 @@ static void parse_cmd_verify(struct ref_transaction *transaction,
>  		die("verify %s: extra input: %s", refname, next);
>  
>  	if (ref_transaction_verify(transaction, refname, &old_oid,
> -				   update_flags, &err))
> +				   NULL, update_flags, &err))
>  
>  	update_flags = default_flags;
>  	free(refname);
>  	strbuf_release(&err);
>  }

The only damage by this patch to parse_cmd_verify() is that
ref_transaction_verify() gained another parameter NULL, but with the
default "--diff-algorithm=myers" algorithm, it is very hard to see.

The "--patience" algorithm does a much beter job on this hunk.

And the following function is entirely new.

> +static void parse_cmd_symref_verify(struct ref_transaction *transaction,
> +				    const char *next, const char *end)
> +{
> +	struct strbuf err = STRBUF_INIT;
> +	struct object_id old_oid;
> +	char *refname, *old_target;
> +
> +	if (!(update_flags & REF_NO_DEREF))
> +		die("symref-verify: cannot operate with deref mode");
> +
> +	refname = parse_refname(&next);
> +	if (!refname)
> +		die("symref-verify: missing <ref>");
> +
> +	/*
> +	 * old_ref is optional, but we want to differentiate between
> +	 * a NULL and zero value.
> +	 */
> +	old_target = parse_next_refname(&next);
> +	if (!old_target)
> +		old_oid = *null_oid();

In many existing code paths, we do not do structure assignment like
this. Instead we do

		oidcpy(&old_oid, null_oid());

We can see an existing example in a common context in a hunk for
refs.c in this patch.

> +	if (*next != line_termination)
> +		die("symref-verify %s: extra input: %s", refname, next);
> +
> +	if (ref_transaction_verify(transaction, refname,
> +				   old_target ? NULL : &old_oid,
> +				   old_target, update_flags, &err))
> +		die("%s", err.buf);

Are static analyzers smart enough to notice that we will not be
using old_oid uninitialized here?  Just wondering.

Anyway.  This ensures ref_transaction_verify() gets either
old_target or old_oid, but never both at the same time.  The caller
to ref_transaction_verify() in the previous function passed NULL for
old_target but it always had a non-NULL old_oid so that is perfectly
fine.

> +	update_flags = default_flags;
> +	free(refname);
> +	free(old_target);
> +	strbuf_release(&err);
> +}

> diff --git a/refs.c b/refs.c
> index 060a31616d..0e1013b5ab 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1217,6 +1217,8 @@ void ref_transaction_free(struct ref_transaction *transaction)
>  
>  	for (i = 0; i < transaction->nr; i++) {
>  		free(transaction->updates[i]->msg);
> +		free((void *)transaction->updates[i]->old_target);
> +		free((void *)transaction->updates[i]->new_target);
>  		free(transaction->updates[i]);
>  	}
>  	free(transaction->updates);
> @@ -1247,9 +1249,13 @@ struct ref_update *ref_transaction_add_update(
>  
>  	update->flags = flags;
>  
> -	if (flags & REF_HAVE_NEW)
> +	if (new_target)
> +		update->new_target = xstrdup(new_target);
> +	if (old_target)
> +		update->old_target = xstrdup(old_target);

Presumably "update" structure, when freshly initialized, has NULL in
both of these _target members?  Otherwise ref_transaction_free()
would get in trouble, so double checking.

> +	if (new_oid && flags & REF_HAVE_NEW)
>  		oidcpy(&update->new_oid, new_oid);
> -	if (flags & REF_HAVE_OLD)
> +	if (old_oid && flags & REF_HAVE_OLD)
>  		oidcpy(&update->old_oid, old_oid);

Since we can ask to work on a symbolic ref, new_oid / old_oid can be
NULL when REF_HAVE_NEW / REF_HAVE_OLD bit is on for _target members.

Makes me wonder if the code becomes easier to follow if the flag
bits are split into four (_NEW -> _NEW_OID + _NEW_TARGET), but let's
not worry about that for now.

> @@ -1286,6 +1292,7 @@ int ref_transaction_update(struct ref_transaction *transaction,
>  	flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
>  
>  	flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
> +	flags |= (new_target ? REF_HAVE_NEW : 0) | (old_target ? REF_HAVE_OLD : 0);

> @@ -1325,14 +1332,17 @@ int ref_transaction_delete(struct ref_transaction *transaction,
>  int ref_transaction_verify(struct ref_transaction *transaction,
>  			   const char *refname,
>  			   const struct object_id *old_oid,
> +			   const char *old_target,
>  			   unsigned int flags,
>  			   struct strbuf *err)
>  {
> -	if (!old_oid)
> -		BUG("verify called with old_oid set to NULL");
> +	if (!old_target && !old_oid)
> +		BUG("verify called with old_oid and old_target set to NULL");

Is it normal if you get _both_ set, or is it equally a BUG()?
The parse_*_verify() codepaths we saw earlier both made sure
only one of the two is non-NULL, and it is unclear what should
happen if both are non-NULL.

> +	if (old_target && !(flags & REF_NO_DEREF))
> +		BUG("verify cannot operate on symrefs with deref mode");
>  	return ref_transaction_update(transaction, refname,
>  				      NULL, old_oid,
> -				      NULL, NULL,
> +				      NULL, old_target,
>  				      flags, NULL, err);
>  }

So this queues an ref_update object whose .new_oid and .new_target
are NULL, and .old_oid and .old_target are what the caller gave us
to check.  The NULLs in .new* members hopefully do not mean "delete
this thing" ;-)

> @@ -2349,6 +2359,12 @@ static int run_transaction_hook(struct ref_transaction *transaction,
>  	for (i = 0; i < transaction->nr; i++) {
>  		struct ref_update *update = transaction->updates[i];
>  
> +		/*
> +		 * Skip reference transaction for symbolic refs.
> +		 */
> +		if (update->new_target || update->old_target)
> +			continue;

Is that a final design, or will the hooks have a chance to interfere?

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 2420dac2aa..53197fa3af 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2425,6 +2425,37 @@ static const char *original_update_refname(struct ref_update *update)
>  	return update->refname;
>  }
>  
> +/*
> + * Check whether the REF_HAVE_OLD and old_target values stored in
> + * update are consistent with ref, which is the symbolic reference's
> + * current value. If everything is OK, return 0; otherwise, write an
> + * error message to err and return -1.
> + */
> +static int check_old_target(struct ref_update *update, char *ref,
> +			    struct strbuf *err)
> +{
> +	if (!(update->flags & REF_HAVE_OLD) ||
> +	    !strcmp(update->old_target, ref))
> +		return 0;

Earlier on the assignment side for "update" structure we saw above,
the guard was (old_target && flags & REF_HAVE_OLD), but here we
assume old_target is valid, which feels a bit asymmetric.

Yes, I can see that the caller does not call us when !old_target,
but still...  Perhaps

	if ((update->flags & REF_HAVE_OLD) && !update->old_target)
		BUG(...);

or something?  Or alternatively, perhaps !!update->old_target should
be the only thing we should check and ignore REF_HAVE_OLD bit?  I am
not sure, but it smells like that the non-NULL-ness of old_target is
the only thing that matters (if it is not NULL, very early in the
control flow somebody would have set REF_HAVE_OLD bit to flags, no?).

It brings me back to my earlier question.  Does REF_HAVE_OLD bit
serve a useful purpose in this code?

> +	if (!strcmp(update->old_target, ""))
> +		strbuf_addf(err, "cannot lock ref '%s': "
> +			    "reference already exists",
> +			    original_update_refname(update));
> +	else if (!strcmp(ref, ""))
> +		strbuf_addf(err, "cannot lock ref '%s': "
> +			    "reference is missing but expected %s",
> +			    original_update_refname(update),
> +			    update->old_target);

So... for old_target and ref, an empty string is a special value?
How?  Shouldn't that be documented in the comment before the
function?

> +	else
> +		strbuf_addf(err, "cannot lock ref '%s': "
> +			    "is at %s but expected %s",
> +			    original_update_refname(update),
> +			    ref, update->old_target);
> +
> +	return -1;
> +}
> +
>  /*
>   * Check whether the REF_HAVE_OLD and old_oid values stored in update
>   * are consistent with oid, which is the reference's current value. If
> @@ -2528,6 +2559,18 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  					ret = TRANSACTION_GENERIC_ERROR;
>  					goto out;
>  				}
> +			}
> +
> +			/*
> +			 * For symref verification, we need to check the reference value
> +			 * rather than the oid. If we're dealing with regular refs or we're
> +			 * verifying a dereferenced symref, we then check the oid.
> +			 */
> +			if (update->old_target) {
> +				if (check_old_target(update, referent.buf, err)) {
> +					ret = TRANSACTION_GENERIC_ERROR;
> +					goto out;
> +				}

We come here only when update->type has REF_ISSYMREF bit on (we
learned that value by calling lock_raw_ref()), and know referent.buf
has the current "target" value.  That is consumed as "ref" parameter
to check_old_target() we just saw.  OK.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 6104471199..a2474245aa 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -938,7 +938,26 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  		 * individual refs. But the error messages match what the files
>  		 * backend returns, which keeps our tests happy.
>  		 */
> -		if (u->flags & REF_HAVE_OLD && !oideq(&current_oid, &u->old_oid)) {
> +		if ((u->flags & REF_HAVE_OLD) && u->old_target) {
> +			if (strcmp(referent.buf, u->old_target)) {
> +				if (!strcmp(u->old_target, ""))
> +					strbuf_addf(err, "verifying symref target: '%s': "
> +						    "provided target is empty",
> +						    original_update_refname(u));
> +				else if (!strcmp(referent.buf, ""))
> +					strbuf_addf(err, "verifying symref target: '%s': "
> +						    "reference is missing but expected %s",
> +						    original_update_refname(u),
> +						    u->old_target);
> +				else
> +					strbuf_addf(err, "verifying symref target: '%s': "
> +						    "is at %s but expected %s",
> +						    original_update_refname(u),
> +						    referent.buf, u->old_target);
> +				ret = -1;
> +				goto done;
> +			}

Again, the puzzling "empty string"s are handled here.