One more follow-on thought. I imagine that most consumers of git are nowhere near the scale of the test repo that I described. They may still enjoy benefit from efforts to improve git support for large repos. A few possible reasons: 1. The performance improvements should speed things up for smaller repos as well. 2. They may find their repos growing to a 'large scale' at some point in the future. 3. Any code cleanup as part of an effort to support git scalability is good for code base health and e.g., would facilitate future modifications that may more directly affect them. Cheers, Josh ________________________________________ From: Nguyen Thai Ngoc Duy [pclouds@xxxxxxxxx] Sent: Friday, February 03, 2012 10:53 PM To: Joshua Redstone Cc: git@xxxxxxxxxxxxxxx Subject: Re: Git performance results on a large repository On Fri, Feb 3, 2012 at 9:20 PM, Joshua Redstone <joshua.redstone@xxxxxx> wrote: > I timed a few common operations with both a warm OS file cache and a cold > cache. i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did > the operation in question a few times (first timing is the cold timing, > the next few are the warm timings). The following results are on a server > with average hard drive (I.e., not flash) and > 10GB of ram. > > 'git status' : 39 minutes cold, and 24 seconds warm. > > 'git blame': 44 minutes cold, 11 minutes warm. > > 'git add' (appending a few chars to the end of a file and adding it): 7 > seconds cold and 5 seconds warm. > > 'git commit -m "foo bar3" --no-verify --untracked-files=no --quiet > --no-status': 41 minutes cold, 20 seconds warm. I also hacked a version > of git to remove the three or four places where 'git commit' stats every > file in the repo, and this dropped the times to 30 minutes cold and 8 > seconds warm. Have you tried "git update-index --assume-unchaged"? That should reduce mass lstat() and hopefully improve the above numbers. The interface is not exactly easy-to-use, but if it has significant gain, then we can try to improve UI. On the index size issue, ideally we should make minimum writes to index instead of rewriting 191 MB index. An improvement we could do now is to compress it, reduce disk footprint, thus disk I/O. If you compress the index with gzip, how big is it? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html