Re: [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 27, 2025 at 10:29 AM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> In 37dc6d8104 (builtin/repack.c: implement support for
> `--max-cruft-size`, 2023-10-02), 'git repack' built on support for
> multiple cruft packs in Git by instructing 'git pack-objects --cruft'
> how to aggregate smaller cruft packs up to the provided threshold.
>
> The implementation in 37dc6d8104 worked something like the following
> pseudo-code:
>
>     total_size = 0;
>
>     for (p in cruft packs) {
>       if (p->pack_size + total_size < max_size) {
>         total_size += p->pack_size;
>         collapse(p)
>       } else {
>         retain(p);
>       }
>     }
>
> The original idea behind this approach was that smaller cruft packs
> would get combined together until the sum of their sizes was no larger
> than the given max pack size.
>
> There is a much simpler way to achieve this, however, which is to simply
> combine *all* cruft packs which are smaller than the threshold,
> regardless of what their sum is. With '--max-pack-size', 'pack-objects'
> will split out the resulting pack into individual pack(s) if necessary
> to ensure that the written pack(s) are each no larger than the provided
> threshold.

That doesn't really "achieve this" though, unless the antecedent of
"this" isn't what was described in the previous paragraph but
something elsewhere.  I suspect your actual meaning was something
along the lines of "There is a much simpler way to combine cruft
packs, however, which..." ?

> This yields a slight behavior change, which is reflected in the removed
> test. Previous to this change, we would aggregate smaller cruft packs
> first, whereas now we will opportunistically combine as many cruft packs
> as possible. As as result, that test is no longer relevant, and can be
> deleted.

I like the idea, since it sounds like it should be simpler...

> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  builtin/repack.c        | 38 ++-----------------------------------
>  t/t7704-repack-cruft.sh | 42 -----------------------------------------
>  2 files changed, 2 insertions(+), 78 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 75e3752353a..4d83d40f39f 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1022,29 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args,
>         return finish_pack_objects_cmd(&cmd, names, local);
>  }
>
> -static int existing_cruft_pack_cmp(const void *va, const void *vb)
> -{
> -       struct packed_git *a = *(struct packed_git **)va;
> -       struct packed_git *b = *(struct packed_git **)vb;
> -
> -       if (a->pack_size < b->pack_size)
> -               return -1;
> -       if (a->pack_size > b->pack_size)
> -               return 1;
> -       return 0;
> -}
> -
>  static void collapse_small_cruft_packs(FILE *in, size_t max_size,
>                                        struct existing_packs *existing)
>  {
> -       struct packed_git **existing_cruft, *p;
> +       struct packed_git *p;
>         struct strbuf buf = STRBUF_INIT;
> -       size_t total_size = 0;
> -       size_t existing_cruft_nr = 0;
>         size_t i;
>
> -       ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr);
> -
>         for (p = get_all_packs(the_repository); p; p = p->next) {
>                 if (!(p->is_cruft && p->pack_local))
>                         continue;
> @@ -1056,24 +1040,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
>                 if (!string_list_has_string(&existing->cruft_packs, buf.buf))
>                         continue;
>
> -               if (existing_cruft_nr >= existing->cruft_packs.nr)
> -                       BUG("too many cruft packs (found %"PRIuMAX", but knew "
> -                           "of %"PRIuMAX")",
> -                           (uintmax_t)existing_cruft_nr + 1,
> -                           (uintmax_t)existing->cruft_packs.nr);
> -               existing_cruft[existing_cruft_nr++] = p;
> -       }
> -
> -       QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp);
> -
> -       for (i = 0; i < existing_cruft_nr; i++) {
> -               size_t proposed;
> -
> -               p = existing_cruft[i];
> -               proposed = st_add(total_size, p->pack_size);
> -
> -               if (proposed <= max_size) {
> -                       total_size = proposed;
> +               if (p->pack_size < max_size) {

Look at all that deleted code.  Always nice to see a simplification in
action.  :-)

>                         fprintf(in, "-%s\n", pack_basename(p));
>                 } else {
>                         retain_cruft_pack(existing, p);
> @@ -1086,7 +1053,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size,
>                         existing->non_kept_packs.items[i].string);
>
>         strbuf_release(&buf);
> -       free(existing_cruft);
>  }
>
>  static int write_cruft_pack(const struct pack_objects_args *args,
> diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh
> index 959e6e26488..5a76b541ddd 100755
> --- a/t/t7704-repack-cruft.sh
> +++ b/t/t7704-repack-cruft.sh
> @@ -194,48 +194,6 @@ test_expect_success '--max-cruft-size combines existing packs when below thresho
>         )
>  '
>
> -test_expect_success '--max-cruft-size combines smaller packs first' '
> -       git init max-cruft-size-consume-small &&
> -       (
> -               cd max-cruft-size-consume-small &&
> -
> -               test_commit base &&
> -               git repack -ad &&
> -
> -               cruft_foo="$(generate_cruft_pack foo 524288)" &&    # 0.5 MiB
> -               cruft_bar="$(generate_cruft_pack bar 524288)" &&    # 0.5 MiB
> -               cruft_baz="$(generate_cruft_pack baz 1048576)" &&   # 1.0 MiB
> -               cruft_quux="$(generate_cruft_pack quux 1572864)" && # 1.5 MiB
> -
> -               test-tool pack-mtimes "$(basename $cruft_foo)" >expect.raw &&
> -               test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw &&
> -               sort expect.raw >expect.objects &&
> -
> -               # repacking with `--max-cruft-size=2M` should combine
> -               # both 0.5 MiB packs together, instead of, say, one of
> -               # the 0.5 MiB packs with the 1.0 MiB pack
> -               ls $packdir/pack-*.mtimes | sort >cruft.before &&
> -               git repack -d --cruft --max-cruft-size=2M &&
> -               ls $packdir/pack-*.mtimes | sort >cruft.after &&
> -
> -               comm -13 cruft.before cruft.after >cruft.new &&
> -               comm -23 cruft.before cruft.after >cruft.removed &&
> -
> -               test_line_count = 1 cruft.new &&
> -               test_line_count = 2 cruft.removed &&
> -
> -               # the two smaller packs should be rolled up first
> -               printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed &&
> -               test_cmp expect.removed cruft.removed &&
> -
> -               # ...and contain the set of objects rolled up
> -               test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw &&
> -               sort actual.raw >actual.objects &&
> -
> -               test_cmp expect.objects actual.objects
> -       )
> -'
> -
>  test_expect_success 'setup --max-cruft-size with freshened objects' '
>         git init max-cruft-size-freshen &&
>         (
> --
> 2.49.0.rc0.2.gc0c926adde2

Looks good to me, other than the misleading/ambiguous wording in that
one paragraph of the commit message.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux