Re: [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 6, 2025 at 7:34 AM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
> excluded pattern(s), 2023-07-10), the packed-refs backend learned how to
> construct "jump lists" to avoid enumerating sections of the packed-refs
> file that we know the caller is going to throw out anyway.
>
> This process works by finding the start- and end-points (that is, where
> in the packed-refs file corresponds to the range we're going to ignore)
> for each exclude pattern, then constructing a jump list based on that.
> At enumeration time we'll consult the jump list to skip past everything
> in the range(s) found in the previous step, saving time when excluding a
> large portion of references.
>
> But when there is a --exclude pattern which is just the empty string,
> the behavior is a little funky. When we try and exclude the empty
> string, the matched range covers the entire packed-refs file, meaning
> that we won't output any packed references. But the empty pattern
> doesn't actually match any references to begin with! For example, on my
> copy of git.git I can do:
>
>     $ git for-each-ref '' | wc -l
>     0
>
> So "git for-each-ref --exclude=''" shouldn't actually remove anything
> from the output, and ought to be equivalent to "git for-each-ref". But
> it's not, and in fact:
>
>     $ git for-each-ref | wc -l
>     2229
>     $ git for-each-ref --exclude='' | wc -l
>     480
>
> But why does the '--exclude' version output only some of the references
> in the repository? Here's a hint:
>
>     $ find .git/refs -type f | wc -l
>     480
>
> Indeed, because the files backend doesn't implement[^1] the same jump
> list concept as the packed backend we get the correct result for the
> loose references, but none of the packed references.
>
> Since the empty string exclude pattern doesn't match anything, we can
> discard them before the packed-refs backend has a chance to even see it
> (and likewise for reftable, which also implements a similar concept
> since 1869525066 (refs/reftable: wire up support for exclude patterns,
> 2024-09-16)).
>
> This approach (copying only some of the patterns into a strvec at the
> refs.c layer) may seem heavy-handed, but it's setting us up to fix
> another bug in the following commit where the fix will involve modifying
> the incoming patterns.
>
> [^1]: As noted in 59c35fac54. We technically could avoid opening and
>   enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if
>   we knew that we were excluding anything under the 'refs/heads/foo'
>   hierarchy. But the --exclude stuff is all best-effort anyway, since
>   the caller is expected to cull out any results that they don't want.
>
> Noticed-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  refs.c                  | 16 ++++++++++++++++
>  t/t1419-exclude-refs.sh | 10 ++++++++++
>  2 files changed, 26 insertions(+)
>
> diff --git a/refs.c b/refs.c
> index 91da5325d7..17d3840aff 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1699,6 +1699,20 @@ struct ref_iterator *refs_ref_iterator_begin(
>                 enum do_for_each_ref_flags flags)
>  {
>         struct ref_iterator *iter;
> +       struct strvec normalized_exclude_patterns = STRVEC_INIT;
> +
> +       if (exclude_patterns) {
> +               for (size_t i = 0; exclude_patterns[i]; i++) {
> +                       const char *pattern = exclude_patterns[i];
> +                       size_t len = strlen(pattern);
> +                       if (!len)
> +                               continue;
> +
> +                       strvec_push(&normalized_exclude_patterns, pattern);
> +               }
> +
> +               exclude_patterns = normalized_exclude_patterns.v;
> +       }
>
>         if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
>                 static int ref_paranoia = -1;
> @@ -1719,6 +1733,8 @@ struct ref_iterator *refs_ref_iterator_begin(
>         if (trim)
>                 iter = prefix_ref_iterator_begin(iter, "", trim);
>
> +       strvec_clear(&normalized_exclude_patterns);
> +
>         return iter;
>  }
>
> diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh
> index c04eeb7211..fd58260a24 100755
> --- a/t/t1419-exclude-refs.sh
> +++ b/t/t1419-exclude-refs.sh
> @@ -155,4 +155,14 @@ test_expect_success 'meta-characters are discarded' '
>         assert_no_jumps perf
>  '
>
> +test_expect_success 'empty string exclude pattern is ignored' '
> +       git update-ref refs/heads/loose $(git rev-parse refs/heads/foo/1) &&
> +
> +       for_each_ref__exclude refs/heads "" >actual 2>perf &&
> +       for_each_ref >expect &&
> +
> +       test_cmp expect actual &&
> +       assert_no_jumps perf
> +'
> +
>  test_done
> --
> 2.49.0.rc1.2.g67c8c5f7978

Makes sense...but doesn't the second patch also fix this issue without
the first patch being needed?





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux