Re: [PATCH 2/2] ref-filter: support filtering of operational refs

Patrick Steinhardt <ps@xxxxxx> · Thu, 28 Dec 2023 11:34:22 +0100

On Thu, Dec 21, 2023 at 12:40:03PM -0800, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@xxxxxxxxx> writes:
> > With the upcoming introduction of the reftable backend, it becomes ever
> > so important to provide the necessary tooling for printing all refs
> > associated with a repository.
> 
> We have pseudoref (those all caps files outside the refs/ hierarchy)
> as an official term defined in the glossary, and Patrick's reftable
> work based on Han-Wen's work revealed the need to treat FETCH_HEAD
> and MERGE_HEAD as "even more pecurilar than pseudorefs" that need
> different term (tentatively called "special refs").  Please avoid
> coming up with yet another random name "operational" without
> discussing.
> 
> With a quick look at the table in this patch, "pseudorefs" appears
> to be the closest word that people are already familiar with, I
> think.

Agreed, this thought also crossed my mind while reading through the
patches.

> A lot more reasonable thing to do may be to scan the
> $GIT_DIR for files whose name satisfy refs.c:is_pseudoref_syntax()
> and list them, instead of having a hardcoded list of these special
> refs.

Agreed, as well. Despite the reasons mentioned below, the chance for
such a hardcoded list to grow stale is also quite high. And while it
certainly feels very hacky to iterate over the files one by one and
check for each of them whether it could be a pseudo ref, it is the best
we can do to dynamically detect any such reference.

One interesting question is how we should treat files that look like a
pseudoref, but which really aren't. I'm not aware of any such files
written by Git itself, but it could certainly be that a user wrote such
a file into the repository manually. But given that we're adding new
behaviour that will be opt-in (e.g. via a new switch) I'd rather err on
the side of caution and mark any such file as broken instead of silently
ignoring them.

> In addition, when reftable and other backends that can
> natively store things outside refs/ hierarchy is in use, they ought
> to know what they have so enumerating these would not be an issue
> for them without having such a hardcoded table of names.

Yup, for the reftable we don't have the issue of "How do we detect refs
dynamically" at all. So I would love for there to be a way to print all
refs in the refdb, regardless of whether they start with `refs/` or look
like a pseudoref or whatever else. Otherwise it wouldn't be possible for
a user to delete anything stored in the refdb that may have a funny
name, be it intentionally, by accident or due to a bug.

In the reftable backend, the ref iterator's `_advance()` function has a
hardcoded `starts_with(refname, "refs/")` check. If false, then we'd
skip the ref in order to retain the same behaviour that the files
backend has. So maybe what we should be doing is to introduce a new flag
`DO_FOR_EACH_ALL_REFS` and expose it via git-for-each-ref(1) or
git-show-ref(1). So:

  - For the reftable backend we'd skip the `starts_with()` check and
    simply return all refs.

  - For the files backend we'd also iterate through all files in
    $GIT_DIR and check whether they are pseudoref-like.

Patrick
Attachment:
signature.asc

Description: PGP signature