On Tue, Apr 17, 2012 at 12:16 PM, Jay Soffian <jaysoffian@xxxxxxxxx> wrote: > This has worked fine on repos large and small. However, starting a > couple days ago git started running out of memory on a relatively > modest repo[*] while repacking on a Linux box with 12GB memory (+ 12GB > swap). I am able to gc the repo by either removing --aggressive or > .keep'ing the oldest pack. Experimentally, setting pack.windowMemory = 256m keeps git memory usage < 4.5 GB during an aggressive repack. Ironically I end up with a slightly worse pack (63115590 bytes vs 61518628 bytes) than not using --aggressive. I assume this is because pack-objects found a better delta chain during the previous aggressive repack when windowMemory was not set. > 1) If --aggressive does not generally provide a benefit, should it be > made a no-op? I guess I'll revise this question: perhaps --aggressive should be better explained/discouraged. I found a message from Jeff last month and stole his words for this patch: <snip> diff --git i/Documentation/git-gc.txt w/Documentation/git-gc.txt index 815afcb922..ca5bf8b51e 100644 --- i/Documentation/git-gc.txt +++ w/Documentation/git-gc.txt @@ -37,9 +37,8 @@ OPTIONS Usually 'git gc' runs very quickly while providing good disk space utilization and performance. This option will cause 'git gc' to more aggressively optimize the repository at the expense - of taking much more time. The effects of this optimization are - persistent, so this option only needs to be used occasionally; every - few hundred changesets or so. + of taking much more time and potentially using greater memory. This + option is rarely needed. See Repacking below. --auto:: With this option, 'git gc' checks whether any housekeeping is @@ -138,6 +137,39 @@ If you are expecting some objects to be collected and they aren't, check all of those locations and decide whether it makes sense in your case to remove those references. +Repacking +--------- + +Under the covers 'git gc' calls several commands to optimize the repository. +The most significant of these with respect to repository size and general +performance is linkgit:git-repack[1]. There are basically three levels of +'gc' with respect to repacking: + + 1. `git gc --auto`; if there are too many loose objects (`gc.auto`), they + all go into a new incremental pack. If there are already too many + packs (`gc.autopacklimit`), all of the existing packs are re-packed + together. + + Making an incremental pack is by far the fastest because the speed is + independent of the existing repository history. If git packs + everything together, it should be more or less the same as (2). + + 2. `git gc`; this packs everything into a single pack. It uses default + window and depth parameters, but importantly, it reuses existing + deltas. Doing so makes the delta compression phase much faster, and it + often makes the writing phase faster (because for older objects, git + is primarily streaming them right out of the existing pack). On a big + repository though, this does do a lot of I/O, because git has to + rewrite the whole pack. + + 3. `git gc --aggressive`; this is often much slower than (2) because git + throws out all of the existing deltas and recomputes them from + scratch. It uses a higher window parameter meaning it will spend + more time computing, and it may end up with a smaller pack. However, + unless the repository is known to have initially been poorly packed, + this option is not needed and will just cause git to perform + extra work. + HOOKS ----- @@ -147,6 +179,7 @@ linkgit:githooks[5] for more information. SEE ALSO -------- +linkgit:git-pack-refs[1] linkgit:git-prune[1] linkgit:git-reflog[1] linkgit:git-repack[1] </snip> Thoughts? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html