[PATCH 6/8] check_refname_format(): add FULLY_QUALIFIED flag

Jeff King <peff@xxxxxxxx> · Mon, 29 Apr 2024 04:33:25 -0400

Before operating on a refname we get from a user, we usually check that
it's syntactically valid. As a general rule, refs should be in the
"refs/" namespace, the exception being HEAD and pseudorefs like
FETCH_HEAD, etc. Those pseudorefs should consist only of all-caps and
dash. But the syntactic rules are not enforced by check_refname_format().
So for example we will happily operate on a ref "foo/bar" that will use
the file ".git/foo/bar" under the hood (when using the files backend, of
course).

Making things even more complicated, refname_is_safe() does enforce
these syntax restrictions! When that function was added in 2014, we
would have refused to work with such refs entirely. But we stopped being
so picky in 03afcbee9b (read_packed_refs: avoid double-checking sane
refs, 2015-04-16). That rationale there is that check_refname_format()
is supposed to contain a superset of the checks of refname_is_safe().
The idea being that we usually would rely on the more-strict
check_refname_format(), but for certain operations (e.g., deleting a
ref) we want to allow invalid names as long as they are not unsafe
(e.g., not escaping the on-disk "refs/" hierarchy).

But the pseudoref handling flips this logic; check_refname_format() is
more lenient than refname_is_safe(). So you can create "foo/bar" and
read it, but you cannot delete it:

  $ git update-ref foo/bar HEAD
  $ git rev-parse foo/bar
  747a29934757b7e695781e13e2511c43b951da2
  $ git update-ref -d foo/bar
  error: refusing to update ref with bad name 'foo/bar'

So we probably want check_ref_format() to learn the same syntactic
restrictions that refname_is_safe() uses (namely insisting that anything
outside of "refs/" matches the pseudoref syntax). The most obvious way
to do that is simply to call refname_is_safe(). But the point of
03afcbee9b is that doing so is expensive. Without the syntactic
restrictions of check_refname_format(), refname_is_safe() has to
actually normalize the pathname to make sure it does not escape "refs/".
That's redundant for us in check_refname_format(); we just need to make
sure it either starts with "refs/" or is a pseudoref.

But wait, it gets more complicated! We also allow "worktrees/foo/$X"
and "main-worktree/$X". In that case we should only be checking "$X"
(which should be either a pseudoref or start with "refs/"). We can
use parse_worktree_ref(), which fairly efficiently gives us the "bare"
refname (even for a non-worktree ref, where it returns the original
name).

And now when should this new logic kick in? Unfortunately we can't just
do it all the time, because many callers pass in partial ref components.
E.g., if they are thinking about making "refs/heads/foo", they'll pass
us "foo". This is usually accompanied by the ALLOW_ONELEVEL flag. But we
likewise can't take the absence of ALLOW_ONELEVEL as a hint that the
name is fully qualified, because that flag is also used to indicate that
pseudorefs should be allowed!

We need a new flag to tell these two situations apart. So let's add a
FULLY_QUALIFIED flag that callers can use to ask us to enforce these
syntactic rules. There are no callers yet, but we should be able to
examine users of ALLOW_ONELEVEL, figure out which semantics they
wanted, and convert as needed.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
The whole ALLOW_ONELEVEL thing is a long-standing confusion, and
unfortunately has made it into the hands of users via "git
check-ref-format --allow-onelevel". So I think it is there to stay.
Possibly we should expose this new feature as --fully-qualified or
similar.

 refs.c | 14 +++++++++++++-
 refs.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 8cac7e7e59..44b4419050 100644
--- a/refs.c
+++ b/refs.c
@@ -288,6 +288,15 @@ static int check_or_sanitize_refname(const char *refname, int flags,
 {
 	int component_len, component_count = 0;
 
+	if ((flags & REFNAME_FULLY_QUALIFIED)) {
+		const char *bare_ref;
+
+		parse_worktree_ref(refname, NULL, NULL, &bare_ref);
+		if (!starts_with(bare_ref, "refs/") &&
+		    !is_pseudoref_syntax(bare_ref))
+			return -1;
+	}
+
 	if (!strcmp(refname, "@")) {
 		/* Refname is a single character '@'. */
 		if (sanitized)
@@ -322,8 +331,11 @@ static int check_or_sanitize_refname(const char *refname, int flags,
 		else
 			return -1;
 	}
-	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
+
+	if (!(flags & (REFNAME_ALLOW_ONELEVEL | REFNAME_FULLY_QUALIFIED)) &&
+	    component_count < 2)
 		return -1; /* Refname has only one component. */
+
 	return 0;
 }
 
diff --git a/refs.h b/refs.h
index d278775e08..cdd859c8b7 100644
--- a/refs.h
+++ b/refs.h
@@ -571,6 +571,7 @@ int for_each_reflog(each_reflog_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
+#define REFNAME_FULLY_QUALIFIED 4
 
 /*
  * Return 0 iff refname has the correct format for a refname according
-- 
2.45.0.rc1.416.gbe2a76c799