Re: [PATCH v3] set-head: no update without change and better output for --auto

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 16 Sep 2024 11:36:39 -0700

The message I am responding to is sent as the first message to start
a brand-new thread, but for future reference, it makes it easier for
everybody involved if you sent a new iteration of a patch (or a
patch series) as a reply/follow-up to the last round.

If it is too cumbersome to do for any reason, as an alternative, you
can give a reference to previous rounds as URLs to the list archive,
i.e.

 (v1) https://lore.kernel.org/git/20240910203129.2251090-1-bence@xxxxxxxxxxxxxx/
 (v2) https://lore.kernel.org/git/20240910203835.2288291-1-bence@xxxxxxxxxxxxxx/

> Currently, even if there is no actual change to remote/HEAD calling
> remote set-head will overwrite the appropriate file and if set to --auto
> will also print a message saying "remote/HEAD set to branch", which
> implies something was changed.
>
> Change the behaviour of remote set-head so that the reference is only
> updated if it actually needs to change.

I sense a patch with a racy TOCTOU change coming ;-)

> Change the output of --auto, so the output actually reflects what was
> done: a) set a previously unset HEAD, b) change HEAD because remote
> changed or c) no updates. Make the output easily parsable, by using
> a slightly clunky wording that allows all three outputs to have the same
> structure and number of words.

This would be useful.

>         This patch was originally sent along when I thought set-head was
>         going to be invoked by fetch, but the discussion on the RFC
>         concluded that it should be not. This opened the possibility to make
>         it more explicit.

I am not quite sure what the discussion concluded "should be not",
though.  In a message from you

  https://lore.kernel.org/git/D43G2CGX2N7L.ZRETD4HLIH0E@xxxxxxxxxxxxxx/

what we agreed was that "git fetch" does not need a new option, but
we also agreed that it would be a good idea for "git fetch" to
create, refs/remotes/$remote/HEAD when it does not exist without
being told.

I take it that you still want to see such a change to "git fetch"
eventually happen, but decided to tackle the other half of the
original two-patch series first with this patch?

>  static int set_head(int argc, const char **argv, const char *prefix)
>  {
> -	int i, opt_a = 0, opt_d = 0, result = 0;
> -	struct strbuf buf = STRBUF_INIT, buf2 = STRBUF_INIT;
> +	int i, opt_a = 0, opt_d = 0, is_ref_changed = 0, result = 0;

Shall we simply call it ref_changed instead, so that I do not have
to wonder which is better between has_ref_changed and is_ref_changed? ;-)

> +	struct strbuf buf = STRBUF_INIT, buf2 = STRBUF_INIT, buf3 = STRBUF_INIT;
>  	char *head_name = NULL;
>  
>  	struct option options[] = {
> @@ -1440,13 +1440,19 @@ static int set_head(int argc, const char **argv, const char *prefix)
>  
>  	if (head_name) {
>  		strbuf_addf(&buf2, "refs/remotes/%s/%s", argv[0], head_name);

> +		refs_read_symbolic_ref(get_main_ref_store(the_repository),buf.buf,&buf3);
> +		is_ref_changed = strcmp(buf2.buf,buf3.buf);
>  		/* make sure it's valid */
>  		if (!refs_ref_exists(get_main_ref_store(the_repository), buf2.buf))
>  			result |= error(_("Not a valid ref: %s"), buf2.buf);
> -		else if (refs_update_symref(get_main_ref_store(the_repository), buf.buf, buf2.buf, "remote set-head"))
> +		else if (is_ref_changed && refs_update_symref(get_main_ref_store(the_repository), buf.buf, buf2.buf, "remote set-head"))
>  			result |= error(_("Could not setup %s"), buf.buf);

Two and a half small issues:

 - Do not omit " " after "," and avoid overlong lines by folding
   lines at an appropriate place (usually after an operator at
   higher point in parse tree, i.e. after "is_ref_changed &&").

 - This is inherently racy, isn't it?  We read the _current_ value.
   After we do so, but before we write _our_ value, another process
   may update it, so we'd end up overwriting the value they wrote.

 - To compute is_ref_changed, we look at buf3.buf but what happens
   if such a ref does not exist, exists but is not a symbolic ref,
   or is a hierarchy under which other refs hang (e.g., a funny ref
   "refs/remotes/origin/HEAD/main" exists)?  Does the strcmp()
   compare a valid buf2.buf and an uninitialized junk in buf3.buf
   and give a random result?  Shouldn't the code checking the return
   value from the refs_read_symbolic_ref() call, and if we did so,
   would we learn enough to tell among the cases where (1) it is a
   symref, (2) it is a regular ref, (3) no such ref exists, and (4)
   the refs layer hit an error to prevent us from giving one of the
   previous 3 answers?

If we really wanted to resolve the raciness, I think we need to
enhance the set of values that create_symref() can optionally return
to its callers so that the caller can tell what the old value was.

I am not sure offhand how involved such an update would be, though.

> +		else if (opt_a && !strcmp(buf3.buf,""))
> +			printf("%s/HEAD was unset: set to %s\n", argv[0], head_name);

I suspect that this does not account for a case where the
"read_symbolic_ref()" we did earlier failed for a reason other than
"the ref did not exist".

> +		else if (opt_a && is_ref_changed)
> +			printf("%s/HEAD was changed: set to %s\n", argv[0], head_name);
>  		else if (opt_a)
> -			printf("%s/HEAD set to %s\n", argv[0], head_name);
> +			printf("%s/HEAD was unchanged: set to %s\n", argv[0], head_name);
>  		free(head_name);
>  	}

Quoting the values we obtained from the environment or the user may
be nicer, e.g.

    'refs/remotes/origin/HEAD' is now set to 'main'.
    'refs/remotes/origin/HEAD' is unchanged and points at 'main'.

This is a tangent outside the immediate topic of this patch, but I
wonder if we need "--quiet" mode.

Thanks.