On Tue, Aug 11, 2020 at 03:30:18PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > The --batch-size=<size> option of 'git multi-pack-index repack' is > intended to limit the amount of work done by the repack. In the case of > a large repository, this command should repack a number of small > pack-files but leave the large pack-files alone. Most often, the > repository has one large pack-file from a 'git clone' operation and > number of smaller pack-files from incremental 'git fetch' operations. > > The issue with '--batch-size' is that it also _prevents_ the repack from > happening if the expected size of the resulting pack-file is too small. > This was intended as a way to avoid frequent churn of small pack-files, > but it has mostly caused confusion when a repository is of "medium" > size. That is, not enormous like the Windows OS repository, but also not > so small that this incremental repack isn't valuable. > > The solution presented here is to collect pack-files for repack if their > expected size is smaller than the batch-size parameter until either the > total expected size exceeds the batch-size or all pack-files are > considered. If there are at least two pack-files, then these are > combined to a new pack-file whose size should not be too much larger > than the batch-size. > > This new strategy should succeed in keeping the number of pack-files > small in these "medium" size repositories. The concern about churn is > likely not interesting, as the real control over that is the frequency > in which the repack command is run. OK. This '--batch-size' parameter is a little magical to me, but the behavior you are describing seems sane. I think that your assessment of the down-side is reasonable, too. Looks fine to me. Reviewed-by: Taylor Blau <me@xxxxxxxxxxxx> > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > multi-pack-index: repack batches below --batch-size > > As reported [1], the 'git multi-pack-index repack' command has some > unexpected behavior due to the nature of "expected size" for un-thinned > fetch packs and the fact that the batch size requires the total size to > be at least as large as that batch-size. By removing this minimum size > restriction, we will repack more frequently and prevent this "many > pack-file" problems. > > [1] > https://lore.kernel.org/git/6FA8F54A-C92D-497B-895F-AC6E8287AACD@xxxxxxxxx/ > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-698%2Fderrickstolee%2Fmidx-repack-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-698/derrickstolee/midx-repack-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/698 > > Documentation/git-multi-pack-index.txt | 11 ++++++----- > midx.c | 2 +- > t/t5319-multi-pack-index.sh | 18 ++++++++++++++++++ > 3 files changed, 25 insertions(+), 6 deletions(-) > > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt > index 0c6619493c..eb0caa0439 100644 > --- a/Documentation/git-multi-pack-index.txt > +++ b/Documentation/git-multi-pack-index.txt > @@ -51,11 +51,12 @@ repack:: > multi-pack-index, then divide by the total number of objects in > the pack and multiply by the pack size. We select packs with > expected size below the batch size until the set of packs have > - total expected size at least the batch size. If the total size > - does not reach the batch size, then do nothing. If a new pack- > - file is created, rewrite the multi-pack-index to reference the > - new pack-file. A later run of 'git multi-pack-index expire' will > - delete the pack-files that were part of this batch. > + total expected size at least the batch size, or all pack-files > + are considered. If only one pack-file is selected, then do > + nothing. If a new pack-file is created, rewrite the > + multi-pack-index to reference the new pack-file. A later run of > + 'git multi-pack-index expire' will delete the pack-files that > + were part of this batch. > + > If `repack.packKeptObjects` is `false`, then any pack-files with an > associated `.keep` file will not be selected for the batch to repack. > diff --git a/midx.c b/midx.c > index 6d1584ca51..38690b46c9 100644 > --- a/midx.c > +++ b/midx.c > @@ -1371,7 +1371,7 @@ static int fill_included_packs_batch(struct repository *r, > > free(pack_info); > > - if (total_size < batch_size || packs_to_repack < 2) > + if (packs_to_repack < 2) > return 1; > > return 0; > diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh > index 7214cab36c..b05190f500 100755 > --- a/t/t5319-multi-pack-index.sh > +++ b/t/t5319-multi-pack-index.sh > @@ -643,6 +643,7 @@ test_expect_success 'expire respects .keep files' ' > ' > > test_expect_success 'repack --batch-size=0 repacks everything' ' > + cp -r dup dup2 && > ( > cd dup && > rm .git/objects/pack/*.keep && > @@ -662,4 +663,21 @@ test_expect_success 'repack --batch-size=0 repacks everything' ' > ) > ' > > +test_expect_success 'repack --batch-size=<large> repacks everything' ' > + ( > + cd dup2 && > + rm .git/objects/pack/*.keep && > + ls .git/objects/pack/*idx >idx-list && > + test_line_count = 2 idx-list && > + git multi-pack-index repack --batch-size=2000000 && > + ls .git/objects/pack/*idx >idx-list && > + test_line_count = 3 idx-list && > + test-tool read-midx .git/objects | grep idx >midx-list && > + test_line_count = 3 midx-list && > + git multi-pack-index expire && > + ls -al .git/objects/pack/*idx >idx-list && > + test_line_count = 1 idx-list > + ) > +' > + > test_done > > base-commit: 47ae905ffb98cc4d4fd90083da6bc8dab55d9ecc > -- > gitgitgadget Thanks, Taylor