On 1/23/2019 5:38 PM, Jonathan Tan wrote:
diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 6186c4c936..cc63531cc0 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -36,6 +36,17 @@ expire::
have no objects referenced by the MIDX. Rewrite the MIDX file
afterward to remove all references to these pack-files.
+repack::
+ Collect a batch of pack-files whose size are all at most the
+ size given by --batch-size, but whose sizes sum to larger
+ than --batch-size. The batch is selected by greedily adding
+ small pack-files starting with the oldest pack-files that fit
+ the size. Create a new pack-file containing the objects the
+ multi-pack-index indexes into those pack-files, and rewrite
+ the multi-pack-index to contain that pack-file. A later run
+ of 'git multi-pack-index expire' will delete the pack-files
+ that were part of this batch.
I see in the subsequent patch that you stop once the batch size is
matched or exceeded - I see that you mention "whose sizes sum to larger
than --batch-size", but this leads me to think that if the total so
happens to not exceed the batch size, don't do anything, but otherwise
repack *all* the small packs together.
I would write this as:
Create a new packfile containing the objects in the N least-sized
packfiles referenced by the multi-pack-index, where N is the smallest
number such that the total size of the packfiles equals or exceeds the
given batch size. Rewrite the multi-pack-index to reference the new
packfile instead of the N packfiles. A later run of ...
Thanks for the suggestion.
It is slightly wrong, in that we don't sort by size. Instead we sort by
modified time. That makes is a little complicated, but I'll give it
another shot using your framing:
Create a new pack-file containing objects in small pack-files
referenced by the multi-pack-index. Select the pack-files by
examining packs from oldest-to-newest, adding a pack if its
size is below the batch size. Stop adding packs when the sum
of sizes of the added packs is above the batch size. If the
total size does not reach the batch size, then do nothing.
Rewrite the multi-pack-index to reference the new pack-file.
A later run of 'git multi-pack-index expire' will delete the
pack-files that were part of this batch.
-Stolee