[PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When generating multiple cruft packs with 'git repack --max-cruft-size',
we use 'git pack-objects --cruft --max-pack-size' (with many other
elided options), filling in the '--max-pack-size' value with whatever
was provided via the '--max-cruft-size' flag.

This causes us to generate a pack that is smaller than the specified
threshold. This poses a problem since we will never be able to generate
a cruft pack that crosses the threshold. In effect, this means that we
will try and repack its contents over and over again.

Instead, change the meaning of '--max-pack-size' in pack-objects when
combined with '--cruft'. When put together, '--max-pack-size' allows the
pack to grow larger than the specified threshold, but only by one
additional object.

This allows cruft packs to become just a little bit larger than the
threshold, allowing cruft packs to accumulate past the set threshold and
avoid being repacked in the future until a pruning GC takes place.

Noticed-by: Patrick Steinhardt <ps@xxxxxx>
Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 Documentation/config/pack.adoc      |  4 ++
 Documentation/git-pack-objects.adoc |  4 ++
 builtin/pack-objects.c              | 32 +++++++++++++-
 t/t5329-pack-objects-cruft.sh       | 67 +++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc
index da527377fa..0a90931b93 100644
--- a/Documentation/config/pack.adoc
+++ b/Documentation/config/pack.adoc
@@ -119,6 +119,10 @@ sizes (e.g., removable media that cannot store the whole repository),
 you are likely better off creating a single large packfile and splitting
 it using a generic multi-volume archive tool (e.g., Unix `split`).
 +
+When generating cruft packs with `git pack-objects`, this option has a
+slightly different interpretation than above; see the documentation for
+`--max-pack-size` option in linkgit:git-pack-objects[1].
++
 The minimum size allowed is limited to 1 MiB. The default is unlimited.
 Common unit suffixes of 'k', 'm', or 'g' are supported.
 
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 7f69ae4855..aee467c496 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -161,6 +161,10 @@ depth is 4095.
 	`pack.packSizeLimit` is set. Note that this option may result in
 	a larger and slower repository; see the discussion in
 	`pack.packSizeLimit`.
++
+When used with `--cruft`, the output packfile(s) may be as large or
+larger than the configured maximum size. The pack will exceed the
+specified maximum by no more than one object.
 
 --honor-pack-keep::
 	This flag causes an object already in a local pack that
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 58a9b16126..f701b4c9ec 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -692,11 +692,39 @@ static off_t write_object(struct hashfile *f,
 	off_t len;
 	int usable_delta, to_reuse;
 
+	if (cruft && pack_size_limit && pack_size_limit <= write_offset) {
+		/*
+		 * When writing a cruft pack with a limited size,
+		 * perform the --max-pack-size check *before* writing
+		 * the object.
+		 *
+		 * When we have not yet reached the size limit, this
+		 * combined with the fact that we act as if there is no
+		 * limit when writing objects via write_object() allows
+		 * us to grow one object *past* the specified limit.
+		 *
+		 * This is important for generating cruft packs with a
+		 * --max-pack-size so we can generate packs that are
+		 * just over the threshold to avoid repacking them in
+		 * the future.
+		 */
+		return 0;
+	}
+
 	if (!pack_to_stdout)
 		crc32_begin(f);
 
-	/* apply size limit if limited packsize and not first object */
-	if (!pack_size_limit || !nr_written)
+	/*
+	 * Apply size limit when one is provided, with the following
+	 * exceptions:
+	 *
+	 * - We are writing the first object.
+	 *
+	 * - We are writing a cruft pack with a size limit. The check
+	 *   above covers this case while letting the pack grow at most
+	 *   one object beyond the limit.
+	 */
+	if (!pack_size_limit || !nr_written || cruft)
 		limit = 0;
 	else if (pack_size_limit <= write_offset)
 		/*
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index 60dac8312d..9cbc21a65d 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -3,6 +3,7 @@
 test_description='cruft pack related pack-objects tests'
 
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-cruft.sh
 
 objdir=.git/objects
 packdir=$objdir/pack
@@ -695,4 +696,70 @@ test_expect_success 'additional cruft blobs via gc.recentObjectsHook' '
 	)
 '
 
+test_expect_success 'cruft pack generation beyond --max-pack-size' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# Disable pack compression to ensure the pack size is
+		# predictable.
+		git config pack.compression 0 &&
+
+		sz=524288 && # 0.5 MiB
+		foo="$(generate_random_blob foo $sz)" &&
+		bar="$(generate_random_blob bar $sz)" &&
+		baz="$(generate_random_blob baz $sz)" &&
+		quux="$(generate_random_blob quux $sz)" &&
+
+		printf "%s\n" "$foo" "$bar" >A.objects &&
+		printf "%s\n" "$baz" "$quux" >B.objects &&
+
+		A="$(git pack-objects $packdir/pack <A.objects)" &&
+		B="$(git pack-objects $packdir/pack <B.objects)" &&
+
+		git prune-packed &&
+
+		sz=1572864 && # 1.5 MiB
+		printf -- "-%s\n" "pack-$A.pack" "pack-$B.pack" >C.in &&
+		git pack-objects --cruft --max-pack-size=$sz $packdir/pack \
+			<C.in >C.out &&
+
+		test_line_count = 2 C.out &&
+		C_large="$(head -n 1 C.out)" &&
+		C_small="$(tail -n 1 C.out)" &&
+
+		# Swap $C_large and $C_small if necessary.
+		if test "$(test_file_size $packdir/pack-$C_large.idx)" -lt \
+			"$(test_file_size $packdir/pack-$C_small.idx)"
+		then
+			tmp="$C_large" &&
+			C_large="$C_small" &&
+			C_small="$tmp"
+		fi &&
+
+		# Ensure the large pack is no smaller than the threshold
+		# such that it does not get repacked in subsequent runs
+		# with the same --max-pack-size setting.
+		test $(test_file_size $packdir/pack-$C_large.pack) -ge $sz &&
+
+		{
+			git show-index <"$packdir/pack-$C_large.idx" &&
+			git show-index <"$packdir/pack-$C_small.idx"
+		} >actual.raw &&
+		printf "%s\n" "$foo" "$bar" "$baz" "$quux" >expect.raw &&
+
+		sort <expect.raw >expect &&
+		cut -d " " -f 2 actual.raw | sort >actual &&
+
+		# Ensure that all of the objects are present in the two
+		# cruft packs we just generated.
+		#
+		# Note that the contents of "actual" are not
+		# de-duplicated. This is intentional to ensure we avoid
+		# packing the same object twice (once in each pack).
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.rc2.6.g9a1eecd400





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux