On Thu, Feb 27, 2025 at 10:29 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > In 37dc6d8104 (builtin/repack.c: implement support for > `--max-cruft-size`, 2023-10-02), 'git repack' built on support for > multiple cruft packs in Git by instructing 'git pack-objects --cruft' > how to aggregate smaller cruft packs up to the provided threshold. > > The implementation in 37dc6d8104 worked something like the following > pseudo-code: > > total_size = 0; > > for (p in cruft packs) { > if (p->pack_size + total_size < max_size) { > total_size += p->pack_size; > collapse(p) > } else { > retain(p); > } > } > > The original idea behind this approach was that smaller cruft packs > would get combined together until the sum of their sizes was no larger > than the given max pack size. > > There is a much simpler way to achieve this, however, which is to simply > combine *all* cruft packs which are smaller than the threshold, > regardless of what their sum is. With '--max-pack-size', 'pack-objects' > will split out the resulting pack into individual pack(s) if necessary > to ensure that the written pack(s) are each no larger than the provided > threshold. That doesn't really "achieve this" though, unless the antecedent of "this" isn't what was described in the previous paragraph but something elsewhere. I suspect your actual meaning was something along the lines of "There is a much simpler way to combine cruft packs, however, which..." ? > This yields a slight behavior change, which is reflected in the removed > test. Previous to this change, we would aggregate smaller cruft packs > first, whereas now we will opportunistically combine as many cruft packs > as possible. As as result, that test is no longer relevant, and can be > deleted. I like the idea, since it sounds like it should be simpler... > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > builtin/repack.c | 38 ++----------------------------------- > t/t7704-repack-cruft.sh | 42 ----------------------------------------- > 2 files changed, 2 insertions(+), 78 deletions(-) > > diff --git a/builtin/repack.c b/builtin/repack.c > index 75e3752353a..4d83d40f39f 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -1022,29 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args, > return finish_pack_objects_cmd(&cmd, names, local); > } > > -static int existing_cruft_pack_cmp(const void *va, const void *vb) > -{ > - struct packed_git *a = *(struct packed_git **)va; > - struct packed_git *b = *(struct packed_git **)vb; > - > - if (a->pack_size < b->pack_size) > - return -1; > - if (a->pack_size > b->pack_size) > - return 1; > - return 0; > -} > - > static void collapse_small_cruft_packs(FILE *in, size_t max_size, > struct existing_packs *existing) > { > - struct packed_git **existing_cruft, *p; > + struct packed_git *p; > struct strbuf buf = STRBUF_INIT; > - size_t total_size = 0; > - size_t existing_cruft_nr = 0; > size_t i; > > - ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr); > - > for (p = get_all_packs(the_repository); p; p = p->next) { > if (!(p->is_cruft && p->pack_local)) > continue; > @@ -1056,24 +1040,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, > if (!string_list_has_string(&existing->cruft_packs, buf.buf)) > continue; > > - if (existing_cruft_nr >= existing->cruft_packs.nr) > - BUG("too many cruft packs (found %"PRIuMAX", but knew " > - "of %"PRIuMAX")", > - (uintmax_t)existing_cruft_nr + 1, > - (uintmax_t)existing->cruft_packs.nr); > - existing_cruft[existing_cruft_nr++] = p; > - } > - > - QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp); > - > - for (i = 0; i < existing_cruft_nr; i++) { > - size_t proposed; > - > - p = existing_cruft[i]; > - proposed = st_add(total_size, p->pack_size); > - > - if (proposed <= max_size) { > - total_size = proposed; > + if (p->pack_size < max_size) { Look at all that deleted code. Always nice to see a simplification in action. :-) > fprintf(in, "-%s\n", pack_basename(p)); > } else { > retain_cruft_pack(existing, p); > @@ -1086,7 +1053,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, > existing->non_kept_packs.items[i].string); > > strbuf_release(&buf); > - free(existing_cruft); > } > > static int write_cruft_pack(const struct pack_objects_args *args, > diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh > index 959e6e26488..5a76b541ddd 100755 > --- a/t/t7704-repack-cruft.sh > +++ b/t/t7704-repack-cruft.sh > @@ -194,48 +194,6 @@ test_expect_success '--max-cruft-size combines existing packs when below thresho > ) > ' > > -test_expect_success '--max-cruft-size combines smaller packs first' ' > - git init max-cruft-size-consume-small && > - ( > - cd max-cruft-size-consume-small && > - > - test_commit base && > - git repack -ad && > - > - cruft_foo="$(generate_cruft_pack foo 524288)" && # 0.5 MiB > - cruft_bar="$(generate_cruft_pack bar 524288)" && # 0.5 MiB > - cruft_baz="$(generate_cruft_pack baz 1048576)" && # 1.0 MiB > - cruft_quux="$(generate_cruft_pack quux 1572864)" && # 1.5 MiB > - > - test-tool pack-mtimes "$(basename $cruft_foo)" >expect.raw && > - test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw && > - sort expect.raw >expect.objects && > - > - # repacking with `--max-cruft-size=2M` should combine > - # both 0.5 MiB packs together, instead of, say, one of > - # the 0.5 MiB packs with the 1.0 MiB pack > - ls $packdir/pack-*.mtimes | sort >cruft.before && > - git repack -d --cruft --max-cruft-size=2M && > - ls $packdir/pack-*.mtimes | sort >cruft.after && > - > - comm -13 cruft.before cruft.after >cruft.new && > - comm -23 cruft.before cruft.after >cruft.removed && > - > - test_line_count = 1 cruft.new && > - test_line_count = 2 cruft.removed && > - > - # the two smaller packs should be rolled up first > - printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed && > - test_cmp expect.removed cruft.removed && > - > - # ...and contain the set of objects rolled up > - test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw && > - sort actual.raw >actual.objects && > - > - test_cmp expect.objects actual.objects > - ) > -' > - > test_expect_success 'setup --max-cruft-size with freshened objects' ' > git init max-cruft-size-freshen && > ( > -- > 2.49.0.rc0.2.gc0c926adde2 Looks good to me, other than the misleading/ambiguous wording in that one paragraph of the commit message.