On Mon, Mar 17, 2025 at 4:00 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > The previous commit changed the behavior of repack's '--max-cruft-size' > to specify a cruft pack-specific override for '--max-pack-size'. > > Introduce a new flag, '--combine-cruft-below-size' which is a > replacement for the old behavior of '--max-cruft-size'. This new flag > does explicitly what it says: it combines together cruft packs which are > smaller than a given threshold, and prohibits repacking ones which are > larger. To me "prohibits" suggests some kind of stronger action that potentially persists beyond the end of this operation. Perhaps this could be reworded to something like s/prohibits repacking ones/leaves alone packs/ ? > This accomplishes the original intent of '--max-cruft-size', which was > to avoid repacking cruft packs larger than the given threshold. > > The new behavior is slightly different. Instead of building up small > packs together until the threshold is met, '--combine-cruft-below-size' > packs up *all* cruft packs smaller than the threshold. This means that > we may make a pack much larger than the given threshold (e.g., if you > aggregate 5 packs which are each 99 MiB in size with a threshold of 100 > MiB). > > But that's OK: the point isn't to restrict the size of the cruft packs > we generate, it's to avoid working with ones that have already grown too > large. If repositories still want to limit the size of the generated > cruft pack(s), they may use '--max-cruft-size' instead. ...but then they wouldn't get any cruft packs being combined. Did you mean s/instead/together with --combine-cruft-below-size/ ? > There's some minor test fallout as a result of the slight differences in > behavior between the old meaning of '--max-cruft-size' and the behavior > of '--combine-cruft-below-size'. In the test which is now called > "--combine-cruft-below-size combines packs", we need to use the new flag > over the old one to exercise that test's intended behavior. The > remainder of the changes there are to improve the clarity of the > comments. > > Suggested-by: Elijah Newren <newren@xxxxxxxxx> > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > Documentation/git-repack.adoc | 8 ++++++++ > builtin/repack.c | 38 +++++++++++++++++++++++------------ > t/t7704-repack-cruft.sh | 22 +++++++++++--------- > 3 files changed, 46 insertions(+), 22 deletions(-) > > diff --git a/Documentation/git-repack.adoc b/Documentation/git-repack.adoc > index 11db43b1c5..8e6d61aa2f 100644 > --- a/Documentation/git-repack.adoc > +++ b/Documentation/git-repack.adoc > @@ -81,6 +81,14 @@ to the new separate pack will be written. > `--max-pack-size` (if any) by default. See the documentation for > `--max-pack-size` for more details. > > +--combine-cruft-below-size=<n>:: > + When generating cruft packs without pruning, only repack > + existing cruft packs whose size is strictly less than `<n>`. > + Cruft packs whose size is greater than or equal to `<n>` are > + left as-is and not repacked. Useful when you want to avoid > + repacking large cruft pack(s) in repositories that have many > + and/or large unreachable objects. > + Does it make sense to modify the documentation for either the --max-cruft-szie or --combine-cruft-below-size options to suggest that if both are used, it is recommended to make --max-cruft-size twice (or more) the value of --combine-cruft-below-size ? > --expire-to=<dir>:: > Write a cruft pack containing pruned objects (if any) to the > directory `<dir>`. This option is useful for keeping a copy of > diff --git a/builtin/repack.c b/builtin/repack.c > index 9658f6b354..f3330ade7b 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -1022,20 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args, > return finish_pack_objects_cmd(&cmd, names, local); > } > > -static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED, > - struct existing_packs *existing) > +static void combine_small_cruft_packs(FILE *in, size_t combine_cruft_below_size, > + struct existing_packs *existing) > { > struct packed_git *p; > struct strbuf buf = STRBUF_INIT; > size_t i; > > - /* > - * Squelch a -Wunused-function warning while we rationalize > - * the behavior of --max-cruft-size. This function will become > - * used again in a future commit. > - */ > - (void)retain_cruft_pack; > - > for (p = get_all_packs(the_repository); p; p = p->next) { > if (!(p->is_cruft && p->pack_local)) > continue; > @@ -1047,7 +1040,12 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size UNUSED, > if (!string_list_has_string(&existing->cruft_packs, buf.buf)) > continue; > > - fprintf(in, "-%s.pack\n", buf.buf); > + if (p->pack_size < combine_cruft_below_size) { > + fprintf(in, "-%s\n", pack_basename(p)); > + } else { > + retain_cruft_pack(existing, p); > + fprintf(in, "%s\n", pack_basename(p)); > + } > } > > for (i = 0; i < existing->non_kept_packs.nr; i++) > @@ -1061,6 +1059,7 @@ static int write_cruft_pack(const struct pack_objects_args *args, > const char *destination, > const char *pack_prefix, > const char *cruft_expiration, > + unsigned long combine_cruft_below_size, > struct string_list *names, > struct existing_packs *existing) > { > @@ -1103,8 +1102,9 @@ static int write_cruft_pack(const struct pack_objects_args *args, > in = xfdopen(cmd.in, "w"); > for_each_string_list_item(item, names) > fprintf(in, "%s-%s.pack\n", pack_prefix, item->string); > - if (args->max_pack_size && !cruft_expiration) { > - collapse_small_cruft_packs(in, args->max_pack_size, existing); > + if (combine_cruft_below_size && !cruft_expiration) { > + combine_small_cruft_packs(in, combine_cruft_below_size, > + existing); > } else { > for_each_string_list_item(item, &existing->non_kept_packs) > fprintf(in, "-%s.pack\n", item->string); > @@ -1158,6 +1158,7 @@ int cmd_repack(int argc, > const char *opt_window_memory = NULL; > const char *opt_depth = NULL; > const char *opt_threads = NULL; > + unsigned long combine_cruft_below_size = 0ul; > > struct option builtin_repack_options[] = { > OPT_BIT('a', NULL, &pack_everything, > @@ -1170,6 +1171,9 @@ int cmd_repack(int argc, > PACK_CRUFT), > OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"), > N_("with --cruft, expire objects older than this")), > + OPT_MAGNITUDE(0, "combine-cruft-below-size", > + &combine_cruft_below_size, > + N_("with --cruft, only repack cruft packs smaller than this")), > OPT_MAGNITUDE(0, "max-cruft-size", &cruft_po_args.max_pack_size, > N_("with --cruft, limit the size of new cruft packs")), > OPT_BOOL('d', NULL, &delete_redundant, > @@ -1413,7 +1417,8 @@ int cmd_repack(int argc, > cruft_po_args.quiet = po_args.quiet; > > ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix, > - cruft_expiration, &names, > + cruft_expiration, > + combine_cruft_below_size, &names, > &existing); > if (ret) > goto cleanup; > @@ -1440,10 +1445,17 @@ int cmd_repack(int argc, > * generate an empty pack (since every object not in the > * cruft pack generated above will have an mtime older > * than the expiration). > + * > + * Pretend we don't have a `--combine-cruft-below-size` > + * argument, since we're not selectively combining > + * anything based on size to generate the limbo cruft > + * pack, but rather removing all cruft packs from the > + * main repository regardless of size. > */ > ret = write_cruft_pack(&cruft_po_args, expire_to, > pack_prefix, > NULL, > + 0ul, > &names, > &existing); > if (ret) > diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh > index 6debad368d..8aebfb45f5 100755 > --- a/t/t7704-repack-cruft.sh > +++ b/t/t7704-repack-cruft.sh > @@ -194,10 +194,13 @@ test_expect_success '--max-cruft-size combines existing packs when not too large > ) > ' > > -test_expect_failure '--max-cruft-size combines smaller packs first' ' > - git init max-cruft-size-consume-small && > +test_expect_success '--combine-cruft-below-size combines packs' ' > + repo=combine-cruft-below-size && > + test_when_finished "rm -fr $repo" && > + > + git init "$repo" && > ( > - cd max-cruft-size-consume-small && > + cd "$repo" && > > test_commit base && > git repack -ad && > @@ -211,11 +214,11 @@ test_expect_failure '--max-cruft-size combines smaller packs first' ' > test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw && > sort expect.raw >expect.objects && > > - # repacking with `--max-cruft-size=2M` should combine > - # both 0.5 MiB packs together, instead of, say, one of > - # the 0.5 MiB packs with the 1.0 MiB pack > + # Repacking with `--combine-cruft-below-size=1M` > + # should combine both 0.5 MiB packs together, but > + # ignore the two packs which are >= 1.0 MiB. > ls $packdir/pack-*.mtimes | sort >cruft.before && > - git repack -d --cruft --max-cruft-size=2M && > + git repack -d --cruft --combine-cruft-below-size=1M && > ls $packdir/pack-*.mtimes | sort >cruft.after && > > comm -13 cruft.before cruft.after >cruft.new && > @@ -224,11 +227,12 @@ test_expect_failure '--max-cruft-size combines smaller packs first' ' > test_line_count = 1 cruft.new && > test_line_count = 2 cruft.removed && > > - # the two smaller packs should be rolled up first > + # The two packs smaller than 1.0MiB should be repacked > + # together. > printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed && > test_cmp expect.removed cruft.removed && > > - # ...and contain the set of objects rolled up > + # ...and contain the set of objects rolled up. > test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw && > sort actual.raw >actual.objects && > > -- > 2.49.0.rc0.6.g7f120c35e9 The rest looks good to me (although I reviewed a preview and you addressed that feedback, so perhaps not so surprising...)