Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns

Patrick Steinhardt <ps@xxxxxx> · Fri, 7 Mar 2025 10:35:49 +0100

On Thu, Mar 06, 2025 at 09:27:21AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
> 
> > So there is a subtle bug with '--exclude' which is that in the
> > packed-refs backend we will consider "refs/heads/bar" to be a pattern
> > match against "refs/heads/ba" when we shouldn't. Likewise, the reftable
> > backend (which in this case is bug-compatible with the packed backend)
> > exhibits the same broken behavior.
> > ...
> > There is some minor test fallout in the "overlapping excluded regions"
> > test, which happens to use 'refs/ba' as an exclude pattern, and expects
> > references under the "refs/heads/bar/*" and "refs/heads/baz/*"
> > hierarchies to be excluded from the results.
> >
> > ... test (since the range is no longer
> > overlapping under the stricter interpretation of --exclude patterns
> > presented here).
> 
> The code change, reasoning, and the tests look all good.  It just
> leaves a bit awkward aftertaste.
> 
> In general, I think our "we have a tree-like structure with patterns
> to match paths" code paths, like pathspec matching, are structured
> in such a way that the low level is expected to merely cull
> candidates early as a performance optimization measure (in other
> words, they are allowed false positives and say something matches
> when they do not, but not allowed false negatives) and leave the
> upper level to further reject the ones that do not match the
> pattern.  If packed-refs backend was too loose in its matching and
> erroneously considered that refs/heads/bar matched refs/heads/ba
> pattern, I would naïvely expect that the upper layer would catch and
> reject that refs/heads/bar as not matching.

I think you've swapped things around a bit by accident. The problem is
that the patterns were being matched too loosely by the underlying
backends, which had the consequence that the backends marked too many
refs as excluded. As a result, those reference won't ever be yielded to
the upper layer at all. So the upper layer doesn't even have a chance to
correct such a mistake at all: it cannot correct what it doesn't know.

There isn't really a way to implement such a safety net, either (or at
least I cannot think of any): the whole point of making backends handle
the exclude patterns is that they can skip whole regions entirely and
not even try to read them.

> Apparently that was not happening and that is why we need this fix?
> 
> Is the excluded region optimization expected to be powerful enough
> to cover all our needs so that we do not need to post-process what
> it passes?

No, it's not. But we can only correct false negatives, not false
positives:

  - A false negative is a ref that matches an exclude pattern but that
    we yield regardless from the backend, and those do get handled by
    the upper layer.

  - A false positive is a ref that does not match an exclude pattern but
    is still treated as matching by the backend. We thus don't yield
    them, and thus the upper layer cannot rectify the bug.

The fix at hand fixes false positives.

What makes me feel a bit uneasy is that for the "files" backend the
optimization depends on the packed state, which is quite awkward overall
as our tests may not uncover issues only because we didn't pack refs. I
don't really see a way to address this potential test gap generically
though.

The "reftable" backend doesn't have the same issue as it does not have
the same split between packed and loose refs, so the optimization always
kicks in.

Patrick