Re: [PATCH v3 7/9] multi-pack-index: prepare 'repack' subcommand

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/2019 5:38 PM, Jonathan Tan wrote:
diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 6186c4c936..cc63531cc0 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -36,6 +36,17 @@ expire::
  	have no objects referenced by the MIDX. Rewrite the MIDX file
  	afterward to remove all references to these pack-files.
+repack::
+	Collect a batch of pack-files whose size are all at most the
+	size given by --batch-size, but whose sizes sum to larger
+	than --batch-size. The batch is selected by greedily adding
+	small pack-files starting with the oldest pack-files that fit
+	the size. Create a new pack-file containing the objects the
+	multi-pack-index indexes into those pack-files, and rewrite
+	the multi-pack-index to contain that pack-file. A later run
+	of 'git multi-pack-index expire' will delete the pack-files
+	that were part of this batch.
I see in the subsequent patch that you stop once the batch size is
matched or exceeded - I see that you mention "whose sizes sum to larger
than --batch-size", but this leads me to think that if the total so
happens to not exceed the batch size, don't do anything, but otherwise
repack *all* the small packs together.

I would write this as:

   Create a new packfile containing the objects in the N least-sized
   packfiles referenced by the multi-pack-index, where N is the smallest
   number such that the total size of the packfiles equals or exceeds the
   given batch size. Rewrite the multi-pack-index to reference the new
   packfile instead of the N packfiles. A later run of ...

Thanks for the suggestion.

It is slightly wrong, in that we don't sort by size. Instead we sort by modified time. That makes is a little complicated, but I'll give it another shot using your framing:

        Create a new pack-file containing objects in small pack-files
        referenced by the multi-pack-index. Select the pack-files by
        examining packs from oldest-to-newest, adding a pack if its
        size is below the batch size. Stop adding packs when the sum
        of sizes of the added packs is above the batch size. If the
        total size does not reach the batch size, then do nothing.
        Rewrite the multi-pack-index to reference the new pack-file.
        A later run of 'git multi-pack-index expire' will delete the
        pack-files that were part of this batch.

-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux