On Tue, Apr 30, 2024 at 06:54:04AM +0200, Patrick Steinhardt wrote: > On Mon, Apr 29, 2024 at 04:33:25AM -0400, Jeff King wrote: > > Before operating on a refname we get from a user, we usually check that > > it's syntactically valid. As a general rule, refs should be in the > > "refs/" namespace, the exception being HEAD and pseudorefs like > > FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and > > dash. But the syntactic rules are not enforced by check_refname_format(). > > s/dash/underscore, right? Yep, thanks. > > Making things even more complicated, refname_is_safe() does enforce > > these syntax restrictions! When that function was added in 2014, we > > would have refused to work with such refs entirely. But we stopped being > > so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane > > refs, 2015-04-16). That rationale there is that check_refname_format() > > is supposed to contain a superset of the checks of refname_is_safe(). > > The idea being that we usually would rely on the more-strict > > check_refname_format(), but for certain operations (e.g., deleting a > > ref) we want to allow invalid names as long as they are not unsafe > > (e.g., not escaping the on-disk "refs/" hierarchy). > > I still think we should eventually merge these functions. It's not > exactly obvious why one would use one or the other. So if we had a > function with strict default behaviour, where the caller can ask for > some loosening of the behaviour via flags, then I think it would become > a ton easier to do the right thing. > > In any case, that doesn't need to be part of this patch series. Yeah, I think it would be fine to merge them with a flag. With the one exception that started this topic (and which I think could go away after this series), every caller of refname_is_safe() only does so in conjunction with check_refname_format() to check "well, is it at least safe?". You will have to tweak the return value somehow to indicate "ok" versus "bad but safe" versus "unsafe" (e.g., resolving sets REF_BAD_NAME but continues for the middle one). I think I'd prefer to leave that out of this series (and after this, I think it becomes easier as a pure refactoring since the behavior remains the same). > > The whole ALLOW_ONELEVEL thing is a long-standing confusion, and > > unfortunately has made it into the hands of users via "git > > check-ref-format --allow-onelevel". So I think it is there to stay. > > Possibly we should expose this new feature as --fully-qualified or > > similar. > > Hm, that's really too bad. I wonder whether we should eventually start > to deprecate `--allow-onelevel` in favor of `--fully-qualified`. We > would continue to accept the flag, but remove it from our documentation > such that scripts start to move over. Then some day, we may replace > `ALLOW_ONELEVEL` with something like `ALLOW_ROOT_REF` that allows refs > in the root directory while honoring `is_pseudoref_syntax()`. I don't know if we could ever get rid of --allow-onelevel. If you want to check a branch name, say, the replacement for it is to ask about "refs/heads/$name". But sometimes you don't actually know how the short name is going to be used, but you want to make sure it's syntactically valid. E.g., validating a refspec may involve a name like "main" on its own. I suspect it would be OK in practice to just give it an arbitrary "refs/foo/$main", but that feels kind of hacky. -Peff > > @@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags, > > { > > int component_len, component_count = 0; > > > > + if ((flags & REFNAME_FULLY_QUALIFIED)) { > > + const char *bare_ref; > > + > > + parse_worktree_ref(refname, NULL, NULL, &bare_ref); > > + if (!starts_with(bare_ref, "refs/") && > > + !is_pseudoref_syntax(bare_ref)) > > + return -1; > > + } > > + > > if (!strcmp(refname, "@")) { > > /* Refname is a single character '@'. */ > > if (sanitized) > > @@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags, > > else > > return -1; > > } > > - if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2) > > + > > + if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) && > > + component_count < 2) > > return -1; /* Refname has only one component. */ > > + > > I first thought that we don't have to handle REFNAME_FULLY_QUALIFIED > here because the above should already handle it. But we can of course > have a single component, only, when the ref is "refs/". I hadn't really considered that case. The reason we have to handle FULLY_QUALIFIED here is that without it, "FETCH_HEAD" (or for that matter "HEAD") is forbidden as having only a single component. The earlier hunk only rejects bad things, so we still end up in this code. I think that "refs/" is forbidden both before and after my patch because it's invalid to have a zero-length component (so "foo//bar" is forbidden, but so is "foo/" because of the empty component on the end).a -Peff