On Thu, Mar 6, 2025 at 7:34 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid > excluded pattern(s), 2023-07-10), the packed-refs backend learned how to > construct "jump lists" to avoid enumerating sections of the packed-refs > file that we know the caller is going to throw out anyway. > > This process works by finding the start- and end-points (that is, where > in the packed-refs file corresponds to the range we're going to ignore) > for each exclude pattern, then constructing a jump list based on that. > At enumeration time we'll consult the jump list to skip past everything > in the range(s) found in the previous step, saving time when excluding a > large portion of references. > > But when there is a --exclude pattern which is just the empty string, > the behavior is a little funky. When we try and exclude the empty > string, the matched range covers the entire packed-refs file, meaning > that we won't output any packed references. But the empty pattern > doesn't actually match any references to begin with! For example, on my > copy of git.git I can do: > > $ git for-each-ref '' | wc -l > 0 > > So "git for-each-ref --exclude=''" shouldn't actually remove anything > from the output, and ought to be equivalent to "git for-each-ref". But > it's not, and in fact: > > $ git for-each-ref | wc -l > 2229 > $ git for-each-ref --exclude='' | wc -l > 480 > > But why does the '--exclude' version output only some of the references > in the repository? Here's a hint: > > $ find .git/refs -type f | wc -l > 480 > > Indeed, because the files backend doesn't implement[^1] the same jump > list concept as the packed backend we get the correct result for the > loose references, but none of the packed references. > > Since the empty string exclude pattern doesn't match anything, we can > discard them before the packed-refs backend has a chance to even see it > (and likewise for reftable, which also implements a similar concept > since 1869525066 (refs/reftable: wire up support for exclude patterns, > 2024-09-16)). > > This approach (copying only some of the patterns into a strvec at the > refs.c layer) may seem heavy-handed, but it's setting us up to fix > another bug in the following commit where the fix will involve modifying > the incoming patterns. > > [^1]: As noted in 59c35fac54. We technically could avoid opening and > enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if > we knew that we were excluding anything under the 'refs/heads/foo' > hierarchy. But the --exclude stuff is all best-effort anyway, since > the caller is expected to cull out any results that they don't want. > > Noticed-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > refs.c | 16 ++++++++++++++++ > t/t1419-exclude-refs.sh | 10 ++++++++++ > 2 files changed, 26 insertions(+) > > diff --git a/refs.c b/refs.c > index 91da5325d7..17d3840aff 100644 > --- a/refs.c > +++ b/refs.c > @@ -1699,6 +1699,20 @@ struct ref_iterator *refs_ref_iterator_begin( > enum do_for_each_ref_flags flags) > { > struct ref_iterator *iter; > + struct strvec normalized_exclude_patterns = STRVEC_INIT; > + > + if (exclude_patterns) { > + for (size_t i = 0; exclude_patterns[i]; i++) { > + const char *pattern = exclude_patterns[i]; > + size_t len = strlen(pattern); > + if (!len) > + continue; > + > + strvec_push(&normalized_exclude_patterns, pattern); > + } > + > + exclude_patterns = normalized_exclude_patterns.v; > + } > > if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) { > static int ref_paranoia = -1; > @@ -1719,6 +1733,8 @@ struct ref_iterator *refs_ref_iterator_begin( > if (trim) > iter = prefix_ref_iterator_begin(iter, "", trim); > > + strvec_clear(&normalized_exclude_patterns); > + > return iter; > } > > diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh > index c04eeb7211..fd58260a24 100755 > --- a/t/t1419-exclude-refs.sh > +++ b/t/t1419-exclude-refs.sh > @@ -155,4 +155,14 @@ test_expect_success 'meta-characters are discarded' ' > assert_no_jumps perf > ' > > +test_expect_success 'empty string exclude pattern is ignored' ' > + git update-ref refs/heads/loose $(git rev-parse refs/heads/foo/1) && > + > + for_each_ref__exclude refs/heads "" >actual 2>perf && > + for_each_ref >expect && > + > + test_cmp expect actual && > + assert_no_jumps perf > +' > + > test_done > -- > 2.49.0.rc1.2.g67c8c5f7978 Makes sense...but doesn't the second patch also fix this issue without the first patch being needed?