Re: [PATCH] multi-pack-index: repack batches below --batch-size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 11, 2020 at 03:30:18PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>
> The --batch-size=<size> option of 'git multi-pack-index repack' is
> intended to limit the amount of work done by the repack. In the case of
> a large repository, this command should repack a number of small
> pack-files but leave the large pack-files alone. Most often, the
> repository has one large pack-file from a 'git clone' operation and
> number of smaller pack-files from incremental 'git fetch' operations.
>
> The issue with '--batch-size' is that it also _prevents_ the repack from
> happening if the expected size of the resulting pack-file is too small.
> This was intended as a way to avoid frequent churn of small pack-files,
> but it has mostly caused confusion when a repository is of "medium"
> size. That is, not enormous like the Windows OS repository, but also not
> so small that this incremental repack isn't valuable.
>
> The solution presented here is to collect pack-files for repack if their
> expected size is smaller than the batch-size parameter until either the
> total expected size exceeds the batch-size or all pack-files are
> considered. If there are at least two pack-files, then these are
> combined to a new pack-file whose size should not be too much larger
> than the batch-size.
>
> This new strategy should succeed in keeping the number of pack-files
> small in these "medium" size repositories. The concern about churn is
> likely not interesting, as the real control over that is the frequency
> in which the repack command is run.

OK. This '--batch-size' parameter is a little magical to me, but the
behavior you are describing seems sane. I think that your assessment of
the down-side is reasonable, too.

Looks fine to me.

  Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx>

> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> ---
>     multi-pack-index: repack batches below --batch-size
>
>     As reported [1], the 'git multi-pack-index repack' command has some
>     unexpected behavior due to the nature of "expected size" for un-thinned
>     fetch packs and the fact that the batch size requires the total size to
>     be at least as large as that batch-size. By removing this minimum size
>     restriction, we will repack more frequently and prevent this "many
>     pack-file" problems.
>
>     [1]
>     https://lore.kernel.org/git/6FA8F54A-C92D-497B-895F-AC6E8287AACD@xxxxxxxxx/
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-698%2Fderrickstolee%2Fmidx-repack-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-698/derrickstolee/midx-repack-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/698
>
>  Documentation/git-multi-pack-index.txt | 11 ++++++-----
>  midx.c                                 |  2 +-
>  t/t5319-multi-pack-index.sh            | 18 ++++++++++++++++++
>  3 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index 0c6619493c..eb0caa0439 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -51,11 +51,12 @@ repack::
>  	multi-pack-index, then divide by the total number of objects in
>  	the pack and multiply by the pack size. We select packs with
>  	expected size below the batch size until the set of packs have
> -	total expected size at least the batch size. If the total size
> -	does not reach the batch size, then do nothing. If a new pack-
> -	file is created, rewrite the multi-pack-index to reference the
> -	new pack-file. A later run of 'git multi-pack-index expire' will
> -	delete the pack-files that were part of this batch.
> +	total expected size at least the batch size, or all pack-files
> +	are considered. If only one pack-file is selected, then do
> +	nothing. If a new pack-file is created, rewrite the
> +	multi-pack-index to reference the new pack-file. A later run of
> +	'git multi-pack-index expire' will delete the pack-files that
> +	were part of this batch.
>  +
>  If `repack.packKeptObjects` is `false`, then any pack-files with an
>  associated `.keep` file will not be selected for the batch to repack.
> diff --git a/midx.c b/midx.c
> index 6d1584ca51..38690b46c9 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1371,7 +1371,7 @@ static int fill_included_packs_batch(struct repository *r,
>
>  	free(pack_info);
>
> -	if (total_size < batch_size || packs_to_repack < 2)
> +	if (packs_to_repack < 2)
>  		return 1;
>
>  	return 0;
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 7214cab36c..b05190f500 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -643,6 +643,7 @@ test_expect_success 'expire respects .keep files' '
>  '
>
>  test_expect_success 'repack --batch-size=0 repacks everything' '
> +	cp -r dup dup2 &&
>  	(
>  		cd dup &&
>  		rm .git/objects/pack/*.keep &&
> @@ -662,4 +663,21 @@ test_expect_success 'repack --batch-size=0 repacks everything' '
>  	)
>  '
>
> +test_expect_success 'repack --batch-size=<large> repacks everything' '
> +	(
> +		cd dup2 &&
> +		rm .git/objects/pack/*.keep &&
> +		ls .git/objects/pack/*idx >idx-list &&
> +		test_line_count = 2 idx-list &&
> +		git multi-pack-index repack --batch-size=2000000 &&
> +		ls .git/objects/pack/*idx >idx-list &&
> +		test_line_count = 3 idx-list &&
> +		test-tool read-midx .git/objects | grep idx >midx-list &&
> +		test_line_count = 3 midx-list &&
> +		git multi-pack-index expire &&
> +		ls -al .git/objects/pack/*idx >idx-list &&
> +		test_line_count = 1 idx-list
> +	)
> +'
> +
>  test_done
>
> base-commit: 47ae905ffb98cc4d4fd90083da6bc8dab55d9ecc
> --
> gitgitgadget
Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux