On 5/21/07, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> wrote: > ... Operations > such as "git-log --pretty=oneline" were about 30X faster > on a cold cache and 2 to 3X faster otherwise. Process sizes > remained reasonable. Can you give me details about your system? Is this a 64 bit binary?
RHEL4/Nahant on an Opteron. Yes.
What is your core.packedGitWindowSize and core.packedGitLimit set to?
I didn't change the default.
It sounds like the packed version was almost 3 GiB smaller, but was slower because we were mmap'ing far too much data at startup and that was making your OS page in things that you didn't really need to have.
The difference in size is because of the "Custom compression levels" patch -- now the loose objects use Z_BEST_SPEED, whereas the packs use Z_DEFAULT_COMPRESSION.
Mind trying git-log with a smaller core.packedGitWindow{Size,Limit}? Perhaps its just as simple as our defaults are far far too high for your workload...
I think that's a good idea and it should be easy to try tomorrow. It will improve the cold cache case definitely. But we need to consider both *read* and *creation* performance. The portion of the repo I imported to git grows at about 500MB/week (compressed). Should I repack -a every week? Every month? In any case, should I use default window/depth, or 0/0? If default, run-times are prohibitive (in fact, I've always killed each attempt so the machine could be used for "real" work), and if 0/0, then I lose deltification on all objects. These megablobs really are outliers and stress the "one size fits all" approach of packing in git. As a thought experiment, let's (1) pretend git-repack takes --max-blob-size= and --max-pack-size= , (2) pretend the patch doesn't add the repack.maxblobsize variable, and (3) do the following: % git-repack -a -d --max-blob-size=256 % git-repack --max-pack-size=2047 --window=0 --depth=0 The first step makes a digestible 13MB packfile, and the second puts all the megablobs in 6+ 2GB packfiles. Is there really any advantage to carrying out the second step? If I'm processing a 100MB+ blob, do I really care about an extra open(2) call? Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html