Hi Carlos and Tomas and Junio, @Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped things up by maybe 15%, but commits still take a second. @Carlos, by "same size", I mean roughly the same number of files and number of bytes modified in each file. In all experiments, it's less than 5 files modified per commit with changes totaling fewer than 10 KB, often more like 1 KB. I actually wrote a test script to generate commits, customized for the stats on the repo I'm using. It repeatedly generates some changes, does 'git add [ list of files changed ]' followed by 'git commit --no-status -m [ msg ]'. It generates changes by picking fewer than 5 files at random, modifying two 100-byte regions in each file, and occasionally creates a new file of about 1 KB. If it helps, I can probably post the test script I've been using. I tried doing a 'git read-tree HEAD' before each 'git add ; git commit' iteration, and the time for git-commit jumped from about 1 second to about 8 seconds. That is a pretty dramatic slowdown. Any idea why? I wonder if that's related to the overall commit slowness. @Carlos and/or @Junio, can you point me at any docs/code to understand what a tree-cache is and how it differs from the index? I did a google search for [git tree-cache index], but nothing popped out. Cheers, Josh On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@xxxxxxxx> wrote: >On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote: >> Hi, >> I have a git repo with about 300k commits, 150k files totaling maybe >>7GB. >> Locally committing a small change - say touching fewer than 300 bytes >> across 4 files - consistently takes over one second, which seems kinda >> slow. This is using git 1.7.7.4 on a linux 2.6 box. The time does not >> improve after doing a git-gc (my .git dir has maybe 250 files after a >>git >> gc). The same size commit on a brand new repo takes < 10ms. Any >>thoughts >> on why committing a small change seems to take a long time on larger >>repos? > >By "same size commit" do you mean the same amount of changes, or the >same amount of files? Committing doesn't depend on the size of the >repo (by itself), but on the size of the index, which depends on the >amount of files to be committed (as git is snapshot-based). At one >point, commit forgot how to write the tree cache to the index (a >performance optimisation). Do the times improve if you run 'git >read-tree HEAD' between one commit and another? Note that this will >reset the index to the last commit, though for the tests I image you >use some variation of 'git commit -a'. > >Thomas Rast wrote a patch to re-teach commit to store the tree cache, >but there were some issues and never got applied. > >> >> Fwiw, I also tried doing the same test using libgit2 (via the pygit2 >> wrapper), and it was ever slower (about 6 seconds to commit the same >>small >> change). > >I don't know about the python bindings, but on the (somewhat >unscientific) tests for libgit2's write-tree (the slow part of a >creating a commit), it performs slightly faster than git's (though I >think git's write-tree does update the tree cache, which libgit2 >doesn't currently). The speed could just be a side-effect of the small >test repo. From your domain, I assume the data is not for public >consumption, but it'd be great if you could post your code to pygit2's >issue tracker so we can see how much of the slowdown comes from the >bindings or the library. > > cmn > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html