Re: [PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag

Jeff King <peff@xxxxxxxx> · Tue, 30 Apr 2024 06:01:45 -0400

On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote:

> On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote:
> > Before operating on a refname we get from a user, we usually check that
> > it's syntactically valid. As a general rule, refs should be in the
> > "refs/" namespace, the exception being HEAD and pseudorefs like
> > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
> > dash. But the syntactic rules are not enforced by check_refname_format().
> 
> s/dash/underscore, right?

Yep, thanks.

> > Making things even more complicated, refname_is_safe() does enforce
> > these syntax restrictions! When that function was added in 2014, we
> > would have refused to work with such refs entirely. But we stopped being
> > so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
> > refs, 2015-04-16). That rationale there is that check_refname_format()
> > is supposed to contain a superset of the checks of refname_is_safe().
> > The idea being that we usually would rely on the more-strict
> > check_refname_format(), but for certain operations (e.g., deleting a
> > ref) we want to allow invalid names as long as they are not unsafe
> > (e.g., not escaping the on-disk "refs/" hierarchy).
> 
> I still think we should eventually merge these functions. It's not
> exactly obvious why one would use one or the other. So if we had a
> function with strict default behaviour, where the caller can ask for
> some loosening of the behaviour via flags, then I think it would become
> a ton easier to do the right thing.
> 
> In any case, that doesn't need to be part of this patch series.

Yeah, I think it would be fine to merge them with a flag. With the one
exception that started this topic (and which I think could go away after
this series), every caller of refname_is_safe() only does so in
conjunction with check_refname_format() to check "well, is it at least
safe?".

You will have to tweak the return value somehow to indicate "ok" versus
"bad but safe" versus "unsafe" (e.g., resolving sets REF_BAD_NAME but
continues for the middle one). I think I'd prefer to leave that out of
this series (and after this, I think it becomes easier as a pure
refactoring since the behavior remains the same).

> > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
> > unfortunately has made it into the hands of users via "git
> > check-ref-format --allow-onelevel". So I think it is there to stay.
> > Possibly we should expose this new feature as --fully-qualified or
> > similar.
> 
> Hm, that's really too bad. I wonder whether we should eventually start
> to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We
> would continue to accept the flag, but remove it from our documentation
> such that scripts start to move over. Then some day, we may replace
> `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs
> in the root directory while honoring `is_pseudoref_syntax()`.

I don't know if we could ever get rid of --allow-onelevel. If you want
to check a branch name, say, the replacement for it is to ask about
"refs/heads/$name". But sometimes you don't actually know how the short
name is going to be used, but you want to make sure it's syntactically
valid. E.g., validating a refspec may involve a name like "main" on its
own. I suspect it would be OK in practice to just give it an arbitrary
"refs/foo/$main", but that feels kind of hacky.

-Peff

> > @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  {
> >  	int component_len, component_count = 0;
> >  
> > +	if ((flags & REFNAME_FULLY_QUALIFIED)) {
> > +		const char *bare_ref;
> > +
> > +		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
> > +		if (!starts_with(bare_ref, "refs/") &&
> > +		    !is_pseudoref_syntax(bare_ref))
> > +			return -1;
> > +	}
> > +
> >  	if (!strcmp(refname, "@")) {
> >  		/* Refname is a single character '@'. */
> >  		if (sanitized)
> > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
> >  		else
> >  			return -1;
> >  	}
> > -	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
> > +
> > +	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
> > +	    component_count < 2)
> >  		return -1; /* Refname has only one component. */
> > +
> 
> I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED
> here because the above should already handle it. But we can of course
> have a single component, only, when the ref is "refs/".

I hadn't really considered that case. The reason we have to handle
FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that
matter "HEAD") is forbidden as having only a single component. The
earlier hunk only rejects bad things, so we still end up in this code.

I think that "refs/" is forbidden both before and after my patch because
it's invalid to have a zero-length component (so "foo//bar" is
forbidden, but so is "foo/" because of the empty component on the end).a

-Peff