Re: gc --aggressive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 17, 2012 at 12:16 PM, Jay Soffian <jaysoffian@xxxxxxxxx> wrote:
> This has worked fine on repos large and small. However, starting a
> couple days ago git started running out of memory on a relatively
> modest repo[*] while repacking on a Linux box with 12GB memory (+ 12GB
> swap). I am able to gc the repo by either removing --aggressive or
> .keep'ing the oldest pack.

Experimentally, setting pack.windowMemory = 256m keeps git memory
usage < 4.5 GB during an aggressive repack.

Ironically I end up with a slightly worse pack (63115590 bytes vs
61518628 bytes) than not using --aggressive. I assume this is because
pack-objects found a better delta chain during the previous aggressive
repack when windowMemory was not set.

> 1) If --aggressive does not generally provide a benefit, should it be
> made a no-op?

I guess I'll revise this question: perhaps --aggressive should be
better explained/discouraged. I found a message from Jeff last month
and stole his words for this patch:

<snip>
diff --git i/Documentation/git-gc.txt w/Documentation/git-gc.txt
index 815afcb922..ca5bf8b51e 100644
--- i/Documentation/git-gc.txt
+++ w/Documentation/git-gc.txt
@@ -37,9 +37,8 @@ OPTIONS
 	Usually 'git gc' runs very quickly while providing good disk
 	space utilization and performance.  This option will cause
 	'git gc' to more aggressively optimize the repository at the expense
-	of taking much more time.  The effects of this optimization are
-	persistent, so this option only needs to be used occasionally; every
-	few hundred changesets or so.
+	of taking much more time and potentially using greater memory. This
+	option is rarely needed. See Repacking below.

 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
@@ -138,6 +137,39 @@ If you are expecting some objects to be collected
and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.

+Repacking
+---------
+
+Under the covers 'git gc' calls several commands to optimize the repository.
+The most significant of these with respect to repository size and general
+performance is linkgit:git-repack[1]. There are basically three levels of
+'gc' with respect to repacking:
+
+ 1. `git gc --auto`; if there are too many loose objects (`gc.auto`), they
+    all go into a new incremental pack. If there are already too many
+    packs (`gc.autopacklimit`), all of the existing packs are re-packed
+    together.
+
+    Making an incremental pack is by far the fastest because the speed is
+    independent of the existing repository history. If git packs
+    everything together, it should be more or less the same as (2).
+
+ 2. `git gc`; this packs everything into a single pack. It uses default
+    window and depth parameters, but importantly, it reuses existing
+    deltas. Doing so makes the delta compression phase much faster, and it
+    often makes the writing phase faster (because for older objects, git
+    is primarily streaming them right out of the existing pack). On a big
+    repository though, this does do a lot of I/O, because git has to
+    rewrite the whole pack.
+
+ 3. `git gc --aggressive`; this is often much slower than (2) because git
+    throws out all of the existing deltas and recomputes them from
+    scratch. It uses a higher window parameter meaning it will spend
+    more time computing, and it may end up with a smaller pack. However,
+    unless the repository is known to have initially been poorly packed,
+    this option is not needed and will just cause git to perform
+    extra work.
+
 HOOKS
 -----

@@ -147,6 +179,7 @@ linkgit:githooks[5] for more information.

 SEE ALSO
 --------
+linkgit:git-pack-refs[1]
 linkgit:git-prune[1]
 linkgit:git-reflog[1]
 linkgit:git-repack[1]
</snip>

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]