Re: [PATCH v3] set-head: no update without change and better output for --auto

"Bence Ferdinandy" <bence@xxxxxxxxxxxxxx> · Mon, 16 Sep 2024 21:44:47 +0200

On Mon Sep 16, 2024 at 20:36, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> The message I am responding to is sent as the first message to start
> a brand-new thread, but for future reference, it makes it easier for
> everybody involved if you sent a new iteration of a patch (or a
> patch series) as a reply/follow-up to the last round.

Sorry about that, I've only ever used git-send-email on sourcehut before, where
they ask you _not_ to use --in-reply-to because of their UI and got used it...
I'll do that next time!

>
> If it is too cumbersome to do for any reason, as an alternative, you
> can give a reference to previous rounds as URLs to the list archive,
> i.e.
>
>  (v1) https://lore.kernel.org/git/20240910203129.2251090-1-bence@xxxxxxxxxxxxxx/
>  (v2) https://lore.kernel.org/git/20240910203835.2288291-1-bence@xxxxxxxxxxxxxx/

Tbf, I did put this in the commit notes :)

>
> > Currently, even if there is no actual change to remote/HEAD calling
> > remote set-head will overwrite the appropriate file and if set to --auto
> > will also print a message saying "remote/HEAD set to branch", which
> > implies something was changed.
> >
> > Change the behaviour of remote set-head so that the reference is only
> > updated if it actually needs to change.
>
> I sense a patch with a racy TOCTOU change coming ;-)

It seems that indeed :(

[snip]

> I am not quite sure what the discussion concluded "should be not",
> though.  In a message from you
>
>   https://lore.kernel.org/git/D43G2CGX2N7L.ZRETD4HLIH0E@xxxxxxxxxxxxxx/
>
> what we agreed was that "git fetch" does not need a new option, but
> we also agreed that it would be a good idea for "git fetch" to
> create, refs/remotes/$remote/HEAD when it does not exist without
> being told.
>
> I take it that you still want to see such a change to "git fetch"
> eventually happen, but decided to tackle the other half of the
> original two-patch series first with this patch?

Yes, but it did also turn up, that calling `git remote set-head --auto`
directly from `git fetch` as I did in the original RFC requires the user to
authenticate twice, which is not the best UX. So now I'm hammering away at
adding a "set_head" directly to fetch (I've sort-of figured it out already, but
it is definitely more involved than this one, EDIT: since reading below about
racyness, maybe this one is actually more complicated :D)). It's just that now
I can handle the two patches independently.

>
> >  static int set_head(int argc, const char **argv, const char *prefix)
> >  {
> > -	int i, opt_a = 0, opt_d = 0, result = 0;
> > -	struct strbuf buf = STRBUF_INIT, buf2 = STRBUF_INIT;
> > +	int i, opt_a = 0, opt_d = 0, is_ref_changed = 0, result = 0;
>
> Shall we simply call it ref_changed instead, so that I do not have
> to wonder which is better between has_ref_changed and is_ref_changed? ;-)

Makes sense :)

>
> > +	struct strbuf buf = STRBUF_INIT, buf2 = STRBUF_INIT, buf3 = STRBUF_INIT;
> >  	char *head_name = NULL;
> >  
> >  	struct option options[] = {
> > @@ -1440,13 +1440,19 @@ static int set_head(int argc, const char **argv, const char *prefix)
> >  
> >  	if (head_name) {
> >  		strbuf_addf(&buf2, "refs/remotes/%s/%s", argv[0], head_name);
>
> > +		refs_read_symbolic_ref(get_main_ref_store(the_repository),buf.buf,&buf3);
> > +		is_ref_changed = strcmp(buf2.buf,buf3.buf);
> >  		/* make sure it's valid */
> >  		if (!refs_ref_exists(get_main_ref_store(the_repository), buf2.buf))
> >  			result |= error(_("Not a valid ref: %s"), buf2.buf);
> > -		else if (refs_update_symref(get_main_ref_store(the_repository), buf.buf, buf2.buf, "remote set-head"))
> > +		else if (is_ref_changed && refs_update_symref(get_main_ref_store(the_repository), buf.buf, buf2.buf, "remote set-head"))
> >  			result |= error(_("Could not setup %s"), buf.buf);
>
> Two and a half small issues:
>
>  - Do not omit " " after "," and avoid overlong lines by folding
>    lines at an appropriate place (usually after an operator at
>    higher point in parse tree, i.e. after "is_ref_changed &&").
>
>  - This is inherently racy, isn't it?  We read the _current_ value.
>    After we do so, but before we write _our_ value, another process
>    may update it, so we'd end up overwriting the value they wrote.

Honestly, I've just sort of assumed git in general protects against parallel
writes, but I guess I'm wrong :D We'd need to block writes to the reference
while we're doing this, but I'm not quite sure how to achieve that.

>
>  - To compute is_ref_changed, we look at buf3.buf but what happens
>    if such a ref does not exist, exists but is not a symbolic ref,
>    or is a hierarchy under which other refs hang (e.g., a funny ref
>    "refs/remotes/origin/HEAD/main" exists)?  Does the strcmp()
>    compare a valid buf2.buf and an uninitialized junk in buf3.buf
>    and give a random result?  Shouldn't the code checking the return
>    value from the refs_read_symbolic_ref() call, and if we did so,
>    would we learn enough to tell among the cases where (1) it is a
>    symref, (2) it is a regular ref, (3) no such ref exists, and (4)
>    the refs layer hit an error to prevent us from giving one of the
>    previous 3 answers?
>
> If we really wanted to resolve the raciness, I think we need to
> enhance the set of values that create_symref() can optionally return
> to its callers so that the caller can tell what the old value was.
>
> I am not sure offhand how involved such an update would be, though.

Hmm, so there's a lot to unpack for me here. Is create_symref already atomic so
if it could give back the previous value we could decide based on that what to
print (and forget about not writing the ref when there's no need)? Anyhow, I'll
explore this part of the code and see if I can understand what's going on.

[snip]

> Quoting the values we obtained from the environment or the user may
> be nicer, e.g.
>
>     'refs/remotes/origin/HEAD' is now set to 'main'.
>     'refs/remotes/origin/HEAD' is unchanged and points at 'main'.

Ack. I also gather from this that we should be a bit more verbose rather than
trying to make it easy to parse.

I also forgot to free up buf3, which I'll do for the next round.

>
> This is a tangent outside the immediate topic of this patch, but I
> wonder if we need "--quiet" mode.

Even with the planned changed to fetch, if one want to be completely
up-to-date, they'd be running fetch and remote set-head and since the former
has a quiet I think it would make sense. At least: adding it would be more
straight forward than solving raciness :D

Best,
Bence

-- 
bence.ferdinandy.com