Re: Debugging git-commit slowness on a large repo

Joshua Redstone <joshua.redstone@xxxxxx> · Wed, 7 Dec 2011 01:48:46 +0000

Hi Carlos and Tomas and Junio,

@Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped
things up by maybe 15%, but commits still take a second.

@Carlos, by "same size", I mean roughly the same number of files and
number of bytes modified in each file.  In all experiments, it's less than
5 files modified per commit with changes totaling fewer than 10 KB, often
more like 1 KB.  I actually wrote a test script to generate commits,
customized for the stats on the repo I'm using.  It repeatedly generates
some changes, does 'git add [ list of files changed ]' followed by 'git
commit --no-status -m [ msg ]'.   It generates changes by picking fewer
than 5 files at random, modifying two 100-byte regions in each file, and
occasionally creates a new file of about 1 KB.  If it helps, I can
probably post the test script I've been using.

I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
iteration, and the time for git-commit jumped from about 1 second to about
8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
if that's related to the overall commit slowness.

@Carlos and/or @Junio, can you point me at any docs/code to understand
what a tree-cache is and how it differs from the index?  I did a google
search for [git tree-cache index], but nothing popped out.

Cheers,
Josh

On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@xxxxxxxx> wrote:

>On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
>> Hi,
>> I have a git repo with about 300k commits,  150k files totaling maybe
>>7GB.
>>  Locally committing a small change - say touching fewer than 300 bytes
>> across 4 files - consistently takes over one second, which seems kinda
>> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
>> improve after doing a git-gc (my .git dir has maybe 250 files after a
>>git
>> gc).  The same size commit on a brand new repo takes < 10ms.  Any
>>thoughts
>> on why committing a small change seems to take a long time on larger
>>repos?
>
>By "same size commit" do you mean the same amount of changes, or the
>same amount of files? Committing doesn't depend on the size of the
>repo (by itself), but on the size of the index, which depends on the
>amount of files to be committed (as git is snapshot-based). At one
>point, commit forgot how to write the tree cache to the index (a
>performance optimisation). Do the times improve if you run 'git
>read-tree HEAD' between one commit and another? Note that this will
>reset the index to the last commit, though for the tests I image you
>use some variation of 'git commit -a'.
>
>Thomas Rast wrote a patch to re-teach commit to store the tree cache,
>but there were some issues and never got applied.
>
>> 
>> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
>> wrapper), and it was ever slower (about 6 seconds to commit the same
>>small
>> change).
>
>I don't know about the python bindings, but on the (somewhat
>unscientific) tests for libgit2's write-tree (the slow part of a
>creating a commit), it performs slightly faster than git's (though I
>think git's write-tree does update the tree cache, which libgit2
>doesn't currently). The speed could just be a side-effect of the small
>test repo. From your domain, I assume the data is not for public
>consumption, but it'd be great if you could post your code to pygit2's
>issue tracker so we can see how much of the slowdown comes from the
>bindings or the library.
>
>   cmn
>

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html