Re: [PATCH 1/2] Add fetch.updateHead option

Felipe Contreras <felipe.contreras@xxxxxxxxx> · Thu, 06 Apr 2023 20:41:24 -0600

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 05 2023, Felipe Contreras wrote:
> > On Wed, Apr 5, 2023 at 4:28 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> >> On Tue, Apr 04 2023, Felipe Contreras wrote:

> >> > +
> >> > +     if (!ref || !target) {
> >> > +             warning(_("could not update remote head"));
> >> > +             return;
> >> > +     }
> >> > +
> >> > +     r = resolve_ref_unsafe(ref, 0, NULL, &flags);
> >> > +
> >> > +     if (r) {
> >> > +             if (config == FETCH_UPDATE_HEAD_MISSING) {
> >> > +                     if (flags & REF_ISSYMREF)
> >> > +                             /* already present */
> >> > +                             return;
> >> > +             } else if (config == FETCH_UPDATE_HEAD_ALWAYS) {
> >> > +                     if (!strcmp(r, target))
> >> > +                             /* already up-to-date */
> >> > +                             return;
> >>
> >> I think you should name the "enum" you're adding below, the one that
> >> contains the new "FETCH_UPDATE_HEAD_DEFAULT".
> >>
> >> Then this could be a "switch", and the compiler could check the
> >> arguments, i.e. you could pass an enum type instead of an "int".
> >
> > Sure, it can be an `enum fetch_update_mode` instead of `int`, but I
> > don't see what value it provides, other than more verbosity. The enum
> > right above is also unnamed, and 'remote->origin' is an int. And it's
> > not the only enum of that kind in the source code.
> >
> > Using a switch is better, but that doesn't require an enum type. The
> > multiple ifs are just a remnant of a previous version of the code.
> 
> More on this below, but it's for self-documentation (makes the code
> easier to follow),

I guess it *can* make the code easier to follow for some people, but only very
marginally, and certainly not for me.

> and the compiler can notice missing "case" arms,
> which isn't the case with an "int".

Yes, but that has never been useful in my experience.

> > But I seem to recall previous discussions (perhaps in LKML) where
> > people accepted that lines 120-characters long are OK. We don't live
> > in the 80's anymore, terminals have more than 80 columns.
> 
> I don't know what the kernel does, but we try to conform to our
> CodingGuidelines, which sets a limit of 80.

There's a difference between claiming we try to conform to X, and actually
trying to conform to X. I think the evidence above shows it's not the latter.

That is: the guideline says something which isn't actually true. Or at least:
if we are trying, we are not trying very hard.

> But whatever else we do, we don't generally say that a newly added
> function to a given file should be exempted from the preferred coding
> style because the file isn't consistently using it.

A guideline is not a law.

If the guideline says "try to do X", and a patch doesn't do X, that's not a
valid reason to reject it. It is not a prescriptive command, it has no "shall"
or "must". It's merely a suggestion.

And as I showed above, a suggestion that is clearly not followed to the tee in
all the code base.

> >> > diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> >> > index dc44da9c79..dbeb2928ae 100755
> >> > --- a/t/t5510-fetch.sh
> >> > +++ b/t/t5510-fetch.sh
> >> > @@ -814,6 +814,37 @@ test_expect_success 'fetch from multiple configured URLs in single remote' '
> >> >       git fetch multipleurls
> >> >  '
> >> >
> >> > +test_cmp_symbolic_ref () {
> >> > +     git symbolic-ref "$1" >actual &&
> >> > +     echo "$2" >expected &&
> >> > +     test_cmp expected actual
> >> > +}
> >>
> >> Sort of an aside, but this seems to be the Nth use of this pattern in
> >> the test suite, e.g. t1401-symbolic-ref.sh repeatedly hardcodes the
> >> same.
> >>
> >> I wonder if a prep commit to stick this in test-lib-functions.sh would
> >> be in order, or maybe a "--symbolic" argument to "test_cmp_rev"?
> >
> > Sure. If I had incline that such a patch would be merged (or this one)
> > I would do it, but I have a plethora of cleanup patches just gathering
> > dust, so I'd rather not.
> 
> Fair enough, thanks.
> 
> Re the "more below" above, I tried hacking some of what I suggested
> upthread on top of your patches, here's the result of
> that. Changes/commentary:
> 
>  * Switched the "int" to "enum"

For the record: I still don't see any value in doing that.

But I also don't see any harm, so I'm OK with that change.

>  * You've prepared the parse_update_head() to accept a NULL "r", but as
>    this & your other code shows, we never pass it NULL. I don't get why
>    we'd have it handle that case, as surely all plausible users are
>    "populate this config variable for me", no?

Yeah, I don't see any value in checking that, it probably was already there
from the original function I copied the code.

>  * I think better than a BUG() call in the new update_head() we should
>    just drop "need_update_head" entirely. It ends up just being a
>    variable that states "is missing or always", so for update_head() we
>    can just pass a boolean "missing?".

Actually, `need_update_head` doesn't equal "is missing or always": it mostly
tracks the fact that we sent "HEAD" as part of the refspecs sent to the remote.

For example if you do `git fetch`, that sets `need_update_head`, but if you do
`git fetch master` it does not (AFAIK).

So your change is not equivalent, and it would call update_head() unnecessarily
in many instances where there is no "HEAD" coming back from the remote, so
`struct ref *head` is NULL. That's not a big issue, since the function will
simply return in those cases.

But...

According to Jeff King, there are some instances where "HEAD" is coming back
from the server, even if we didn't request it, in those cases we would want the
local "remote/foo/HEAD" to be updated as well (if configured).

So your change is not functionally equivalent: it's actually better. The reason
I didn't implement the logic Jeff King suggested is that I didn't see a way to
do it without complicating the code, but your suggestion is the way.

>  * I renamed "update_head" to "fetch_update_head" just to have the
>    compiler catch cases where we were using the old "int", but if you
>    find some of this useful we could keep the old name.

I do find a lot of this useful (bar the switch to an enum), but I think the old
name is better, since it's a configuration that affects commands other than
`git fetch`, for example `git remote update`.

> Hope some of that helps.
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 6bf147b0123..6492e88d779 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -59,7 +59,7 @@ static int fetch_prune_tags_config = -1; /* unspecified */
>  static int prune_tags = -1; /* unspecified */
>  #define PRUNE_TAGS_BY_DEFAULT 0 /* do we prune tags by default? */
>  
> -static int fetch_update_head = FETCH_UPDATE_HEAD_DEFAULT;
> +static enum fetch_update_head fetch_update_head = FETCH_UPDATE_HEAD_DEFAULT;
>  
>  static int all, append, dry_run, force, keep, multiple, update_head_ok;
>  static int write_fetch_head = 1;
> @@ -1584,7 +1584,8 @@ static int backfill_tags(struct transport *transport,
>  	return retcode;
>  }
>  
> -static void update_head(int config, const struct ref *head, const struct remote *remote)
> +static void update_head(int fetch_missing, const struct ref *head,
> +			struct remote *remote)

This is good, it simplifies the logic below.

>  {
>  	char *ref, *target;
>  	const char *r;
> @@ -1594,7 +1595,7 @@ static void update_head(int config, const struct ref *head, const struct remote
>  		return;
>  
>  	if (!remote->mirror) {
> -		ref = apply_refspecs((struct refspec *)&remote->fetch, "refs/heads/HEAD");
> +		ref = apply_refspecs(&remote->fetch, "refs/heads/HEAD");
>  		target = apply_refspecs((struct refspec *)&remote->fetch, head->symref);

Small nit: you didn't drop this casting.

>  
>  		if (!ref || !target) {
> @@ -1609,17 +1610,14 @@ static void update_head(int config, const struct ref *head, const struct remote
>  	r = resolve_ref_unsafe(ref, 0, NULL, &flags);
>  
>  	if (r) {
> -		if (config == FETCH_UPDATE_HEAD_MISSING) {
> +		if (fetch_missing) {
>  			if (flags & REF_ISSYMREF)
>  				/* already present */
>  				return;
> -		} else if (config == FETCH_UPDATE_HEAD_ALWAYS) {
> -			if (!strcmp(r, target))
> -				/* already up-to-date */
> -				return;
> -		} else
> -			/* should never happen */
> +		} else if (!strcmp(r, target)) {
> +			/* already up-to-date */
>  			return;
> +		}

I prefer to have the main logic (always) on top, but otherwise good.

>  	}
>  
>  	if (!create_symref(ref, target, "remote update head")) {
> @@ -1643,7 +1641,7 @@ static int do_fetch(struct transport *transport,
>  	int must_list_refs = 1;
>  	struct fetch_head fetch_head = { 0 };
>  	struct strbuf err = STRBUF_INIT;
> -	int need_update_head = 0, update_head_config = 0;
> +	enum fetch_update_head update_head_config = FETCH_UPDATE_HEAD_DEFAULT;
>  
>  	if (tags == TAGS_DEFAULT) {
>  		if (transport->remote->fetch_tags == 2)
> @@ -1680,15 +1678,19 @@ static int do_fetch(struct transport *transport,
>  
>  		if (transport->remote->fetch.nr) {
>  
> -			if (transport->remote->update_head)
> -				update_head_config = transport->remote->update_head;
> +			if (transport->remote->fetch_update_head != FETCH_UPDATE_HEAD_DEFAULT)

In Git codestyle implicit tends to be preferred over explicit (as is in the Linux codesyle)

 * `if (p)` over `if (p != NULL)`
 * `if (i)` over `if (i != 0)`
 * `if (!strcmp(...)` over `if (strcmp(...) == 0`

And so on, so I think this strongly suggests this is preferred:

  if (transport->remote->update_head)

Over

  if (transport->remote->fetch_update_head != FETCH_UPDATE_HEAD_DEFAULT)

> +				update_head_config = transport->remote->fetch_update_head;
>  			else
>  				update_head_config = fetch_update_head;
>  
> -			need_update_head = update_head_config && update_head_config != FETCH_UPDATE_HEAD_NEVER;
> -
> -			if (need_update_head)
> +			switch (update_head_config) {
> +			case FETCH_UPDATE_HEAD_MISSING:
> +			case FETCH_UPDATE_HEAD_ALWAYS:
>  				strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
> +			case FETCH_UPDATE_HEAD_DEFAULT:
> +			case FETCH_UPDATE_HEAD_NEVER:

I would rather have "default:" here to catch all the rest.

I suppose this could be clearer to some people (although IMO overly verbose),
but this has nothing to do with the enum change, as it can be done with `int
update_head_config`.

> +				break;
> +			}
>  			refspec_ref_prefixes(&transport->remote->fetch,
>  					     &transport_ls_refs_options.ref_prefixes);
>  		}
> @@ -1801,8 +1803,16 @@ static int do_fetch(struct transport *transport,
>  
>  	commit_fetch_head(&fetch_head);
>  
> -	if (need_update_head)
> -		update_head(update_head_config, find_ref_by_name(remote_refs, "HEAD"), transport->remote);
> +	switch (update_head_config) {
> +	case FETCH_UPDATE_HEAD_MISSING:
> +	case FETCH_UPDATE_HEAD_ALWAYS:
> +		update_head(update_head_config == FETCH_UPDATE_HEAD_MISSING,
> +			    find_ref_by_name(remote_refs, "HEAD"),
> +			    transport->remote);
> +	case FETCH_UPDATE_HEAD_DEFAULT:
> +	case FETCH_UPDATE_HEAD_NEVER:
> +		break;
> +	}

Ditto. Although this isn't functionally equivalent, it's actually better.

---

I've integrated the important parts of these changes into v2 and sent that.

I still don't see an incline of this patch ever being merged, so it's probably
just an exercise, but for the record there it is.

Cheers.

-- 
Felipe Contreras