When generating multiple cruft packs with 'git repack --max-cruft-size', we use 'git pack-objects --cruft --max-pack-size' (with many other elided options), filling in the '--max-pack-size' value with whatever was provided via the '--max-cruft-size' flag. This causes us to generate a pack that is smaller than the specified threshold. This poses a problem since we will never be able to generate a cruft pack that crosses the threshold. In effect, this means that we will try and repack its contents over and over again. Instead, change the meaning of '--max-pack-size' in pack-objects when combined with '--cruft'. When put together, '--max-pack-size' allows the pack to grow larger than the specified threshold, but only by one additional object. This allows cruft packs to become just a little bit larger than the threshold, allowing cruft packs to accumulate past the set threshold and avoid being repacked in the future until a pruning GC takes place. Noticed-by: Patrick Steinhardt <ps@xxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> --- Documentation/config/pack.adoc | 4 ++ Documentation/git-pack-objects.adoc | 4 ++ builtin/pack-objects.c | 32 +++++++++++++- t/t5329-pack-objects-cruft.sh | 67 +++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+), 2 deletions(-) diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc index da527377fa..0a90931b93 100644 --- a/Documentation/config/pack.adoc +++ b/Documentation/config/pack.adoc @@ -119,6 +119,10 @@ sizes (e.g., removable media that cannot store the whole repository), you are likely better off creating a single large packfile and splitting it using a generic multi-volume archive tool (e.g., Unix `split`). + +When generating cruft packs with `git pack-objects`, this option has a +slightly different interpretation than above; see the documentation for +`--max-pack-size` option in linkgit:git-pack-objects[1]. ++ The minimum size allowed is limited to 1 MiB. The default is unlimited. Common unit suffixes of 'k', 'm', or 'g' are supported. diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc index 7f69ae4855..aee467c496 100644 --- a/Documentation/git-pack-objects.adoc +++ b/Documentation/git-pack-objects.adoc @@ -161,6 +161,10 @@ depth is 4095. `pack.packSizeLimit` is set. Note that this option may result in a larger and slower repository; see the discussion in `pack.packSizeLimit`. ++ +When used with `--cruft`, the output packfile(s) may be as large or +larger than the configured maximum size. The pack will exceed the +specified maximum by no more than one object. --honor-pack-keep:: This flag causes an object already in a local pack that diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 58a9b16126..f701b4c9ec 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -692,11 +692,39 @@ static off_t write_object(struct hashfile *f, off_t len; int usable_delta, to_reuse; + if (cruft && pack_size_limit && pack_size_limit <= write_offset) { + /* + * When writing a cruft pack with a limited size, + * perform the --max-pack-size check *before* writing + * the object. + * + * When we have not yet reached the size limit, this + * combined with the fact that we act as if there is no + * limit when writing objects via write_object() allows + * us to grow one object *past* the specified limit. + * + * This is important for generating cruft packs with a + * --max-pack-size so we can generate packs that are + * just over the threshold to avoid repacking them in + * the future. + */ + return 0; + } + if (!pack_to_stdout) crc32_begin(f); - /* apply size limit if limited packsize and not first object */ - if (!pack_size_limit || !nr_written) + /* + * Apply size limit when one is provided, with the following + * exceptions: + * + * - We are writing the first object. + * + * - We are writing a cruft pack with a size limit. The check + * above covers this case while letting the pack grow at most + * one object beyond the limit. + */ + if (!pack_size_limit || !nr_written || cruft) limit = 0; else if (pack_size_limit <= write_offset) /* diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh index 60dac8312d..9cbc21a65d 100755 --- a/t/t5329-pack-objects-cruft.sh +++ b/t/t5329-pack-objects-cruft.sh @@ -3,6 +3,7 @@ test_description='cruft pack related pack-objects tests' . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-cruft.sh objdir=.git/objects packdir=$objdir/pack @@ -695,4 +696,70 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' ' ) ' +test_expect_success 'cruft pack generation beyond --max-pack-size' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + # Disable pack compression to ensure the pack size is + # predictable. + git config pack.compression 0 && + + sz=524288 && # 0.5 MiB + foo="$(generate_random_blob foo $sz)" && + bar="$(generate_random_blob bar $sz)" && + baz="$(generate_random_blob baz $sz)" && + quux="$(generate_random_blob quux $sz)" && + + printf "%s\n" "$foo" "$bar" >A.objects && + printf "%s\n" "$baz" "$quux" >B.objects && + + A="$(git pack-objects $packdir/pack <A.objects)" && + B="$(git pack-objects $packdir/pack <B.objects)" && + + git prune-packed && + + sz=1572864 && # 1.5 MiB + printf -- "-%s\n" "pack-$A.pack" "pack-$B.pack" >C.in && + git pack-objects --cruft --max-pack-size=$sz $packdir/pack \ + <C.in >C.out && + + test_line_count = 2 C.out && + C_large="$(head -n 1 C.out)" && + C_small="$(tail -n 1 C.out)" && + + # Swap $C_large and $C_small if necessary. + if test "$(test_file_size $packdir/pack-$C_large.idx)" -lt \ + "$(test_file_size $packdir/pack-$C_small.idx)" + then + tmp="$C_large" && + C_large="$C_small" && + C_small="$tmp" + fi && + + # Ensure the large pack is no smaller than the threshold + # such that it does not get repacked in subsequent runs + # with the same --max-pack-size setting. + test $(test_file_size $packdir/pack-$C_large.pack) -ge $sz && + + { + git show-index <"$packdir/pack-$C_large.idx" && + git show-index <"$packdir/pack-$C_small.idx" + } >actual.raw && + printf "%s\n" "$foo" "$bar" "$baz" "$quux" >expect.raw && + + sort <expect.raw >expect && + cut -d " " -f 2 actual.raw | sort >actual && + + # Ensure that all of the objects are present in the two + # cruft packs we just generated. + # + # Note that the contents of "actual" are not + # de-duplicated. This is intentional to ensure we avoid + # packing the same object twice (once in each pack). + test_cmp expect actual + ) +' + test_done -- 2.49.0.rc2.6.g9a1eecd400