You're right, more than optimizations, they are modifications that reduce safety checks and make assumptions about the way one is using git (e.g., you always remember to add each file you want to commit). I focused on them because: 1. In our installation, we don't use commit hooks that change what's being committed, so it's good to know that in principle, there's a big perf benefit to be had by leveraging that fact. 2. At an abstract level, it seems like the cost of doing a commit should be proportional to the amount of the repository touched by the commit, not by the size of the repository. These experiments are demonstrations of one direction that a set of optimizations would need to go to get commit performance more along those lines. 3. We're also exploring storage systems that support more efficient ways to query what's changed than stat'ing every file. I forgot to mention, the times I quoted where with --no-verify and --no-status. Adding '-q' didn't speed up performance at all. As a bonus, I've also profiled git-add on the 1-million file repo, and it looks like, as you might expect, the time is dominated by reading and writing the index. The time for git-add is a couple of seconds. Josh On 12/19/11 5:21 PM, "Junio C Hamano" <gitster@xxxxxxxxx> wrote: >Joshua Redstone <joshua.redstone@xxxxxx> writes: > >> I've managed to speed up git-commit on large repos by 4x by removing >>some >> safeguards that caused git to stat every file in the repo on commits >>that >> touch a small number of files. The diff, for illustrative purposes >>only, >> is at: >> >> https://gist.github.com/1499621 >> >> >> With a repo with 1 million files (but few commits), the diff drops the >> commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The >> optimizations are: > >I do not know if these kind of changes are called "optimizations" or >merely making the command record a random tree object that may have some >resemblance to what you wanted to commit but is subtly incorrect. I didn't >fetch your safety removal, though. > >Wouldn't you get a similar speed-up without being unsafe if you simply ran >"git commit" without any parameter (i.e. write out the current index as a >tree and make a commit), combined with "--no-status" and perhaps "-q" to >avoid running the comparison between the resulting commit and the working >tree state after the commit? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html