Re: Debugging git-commit slowness on a large repo

Joshua Redstone <joshua.redstone@xxxxxx> · Tue, 20 Dec 2011 01:40:47 +0000

You're right, more than optimizations, they are modifications that reduce
safety checks and make assumptions about the way one is using git (e.g.,
you always remember to add each file you want to commit).  I focused on
them because:

  1. In our installation, we don't use commit hooks that change what's
being committed, so it's good to know that in principle, there's a big
perf benefit to be had by leveraging that fact.

  2. At an abstract level, it seems like the cost of doing a commit should
be proportional to the amount of the repository touched by the commit, not
by the size of the repository.  These experiments are demonstrations of
one direction that a set of optimizations would need to go to get commit
performance more along those lines.

  3. We're also exploring storage systems that support more efficient ways
to query what's changed than stat'ing every file.

I forgot to mention, the times I quoted where with --no-verify and
--no-status.  Adding '-q' didn't speed up performance at all.

As a bonus, I've also profiled git-add on the 1-million file repo, and it
looks like, as you might expect, the time is dominated by reading and
writing the index.  The time for git-add is a couple of seconds.

Josh

On 12/19/11 5:21 PM, "Junio C Hamano" <gitster@xxxxxxxxx> wrote:

>Joshua Redstone <joshua.redstone@xxxxxx> writes:
>
>> I've managed to speed up git-commit on large repos by 4x by removing
>>some
>> safeguards that caused git to stat every file in the repo on commits
>>that
>> touch a small number of files.  The diff, for illustrative purposes
>>only,
>> is at:
>>
>>     https://gist.github.com/1499621
>>
>>
>> With a repo with 1 million files (but few commits), the diff drops the
>> commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The
>> optimizations are:
>
>I do not know if these kind of changes are called "optimizations" or
>merely making the command record a random tree object that may have some
>resemblance to what you wanted to commit but is subtly incorrect. I didn't
>fetch your safety removal, though.
>
>Wouldn't you get a similar speed-up without being unsafe if you simply ran
>"git commit" without any parameter (i.e. write out the current index as a
>tree and make a commit), combined with "--no-status" and perhaps "-q" to
>avoid running the comparison between the resulting commit and the working
>tree state after the commit?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html