On 5/2/07, Dana How <danahow@xxxxxxxxx> wrote:
On 5/2/07, Junio C Hamano <junkio@xxxxxxx> wrote: > Dana How <danahow@xxxxxxxxx> writes: > > Consequently, for such a usage pattern it is useful > > to specify different compression levels for loose > > objects and packs. This patch implements a config > > variable pack.compression in addition to the existing > > core.compression, meant to be used for repacking. > > It also adds --compression=N to pack-objects, > > meant for push/pull/fetch, if different, or if different > > on a per-repository basis. > > > > ** THIS PATCH IS UNTESTED AND MEANT FOR DISCUSSION. ** > > I think we tweaked this area in the past, but I do not think > the current setting was determined to be the best tradeoff for > all workloads. To be able to discuss the patch, I think it > needs to come with benchmark numbers using publicly available > repositories as guinea pigs and set of typical git operations, > so people can reproduce and compare notes. OK, but this patch doesn't mandate any particular setting. Its motivation in my work environment is for pack.compression to be what core.compression currently is, and to set core.compression to 0 to speed up large commits (the resulting space-inefficient loose objects will be scrubbed away by a later off-line repack). Thus, my config settings (almost) change the gzip's behind a git-add to cp's. Do you want me to submit timings for a git-add/git-commit -a on a typical 50-file commit I would be interested in, with the (new) settings that I would use?
Note the linux-2.6 git tree from a week ago has 22K checked-out files of average size 11KB; the largest is fs/nls/nls-cp949.c at 874KB. (The largest file in git is gitk at 176K.) The tree I'm interested in maintaining with git is almost 70GB checked-out in 13K files of average size >5.2MB. This is over 2 orders of magnitude larger average file size than current git users. (Some of these numbers may decrease after a little retraining ;-).) I would like git to perform as responsively as possible on files up to ~500MB. Within this tree, the largest file is 1234MB [I think checking this in was a mistake!] and I did the following experiments on it: $ rm -rf .git $ git-init Initialized empty Git repository in .git/ $ git-config core.compression -1 $ wc large.spef 12762072 37832482 1234082774 large.spef $ /usr/bin/time git-add large.spef 41.54user 0.70system 0:49.76elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k $ ls -lR .git/objects/?? .git/objects/d5: total 83836 -r--r--r-- 1 how group 85670068 May 2 15:11 d6cde2af063cdfa835038385f29a897bf9533b $ rm -rf .git $ git-init Initialized empty Git repository in .git/ $ git-config core.compression 1 $ wc large.spef 12762072 37832482 1234082774 large.spef $ /usr/bin/time git-add large.spef 23.66user 0.74system 0:34.07elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k $ ls -lR .git/objects/?? .git/objects/d5: total 105116 -r--r--r-- 1 how group 107419557 May 2 15:13 d6cde2af063cdfa835038385f29a897bf9533b So for a 25% increase in blob size I get 33% less elapsed time in git-add, all by changing core.compression from -1 to 1. I'll definitely take that improvement. [For the compressible files we typically have, using 0 is a bad idea: the CPU "advantage" is swamped out by the time to write a much larger file.] Since I don't care [to the same degree] about the responsiveness of packing, I'd rather pack with -1 or better to keep packs small. (And inflation time seems independent of compression setting.) Since someone might be working while the packing is happening, I'd rather not change the config setting to achieve this. Hence the patch. Concerning various public repositories, clearly the patch has no impact if you don't specify different core.compression and pack.compression values. If you do specify different values, I doubt there would be much noticeable speed-up for e.g. the linux-2.6 repo stats I included above. There would be some, but that wasn't the motivation for the patch. Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html