Re: Git Scaling: What factors most affect Git performance for a large repo?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
>> <avarab@xxxxxxxxx> wrote:
>>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>>
>>>  * Around 500k commits
>>>  * Around 100k tags
>>>  * Around 5k branches
>>>  * Around 500 commits/day, almost entirely to the same branch
>>>  * 1.5 GB .git checkout.
>>>  * Mostly text source, but some binaries (we're trying to cut down[1] on those)
>>
>> Would be nice if you could make an anonymized version of this repo
>> public. Working on a "real" large repo is better than an artificial
>> one.
>
> Yeah, I'll try to do that.
>
>>> But actually most of "git fetch" is spent in the reachability check
>>> subsequently done by "git-rev-list" which takes several seconds. I
>>
>> I wonder if reachability bitmap could help here..
>
> I could have sworn I had that enabled already but evidently not. I did
> test it and it cut down on clone times a bit. Now our daily repacking
> is:
>
>         git --git-dir={} gc &&
>         git --git-dir={} pack-refs --all --prune &&
>         git --git-dir={} repack -Ad --window=250 --depth=100
> --write-bitmap-index --pack-kept-objects &&
>
> It's not clear to me from the documentation whether this should just
> be enabled on the server, or the clients too. In any case I've enabled
> it on both.
>
> Even then with it enabled on both a "git pull" that pulls down just
> one commit on one branch is 13s. Trace attached at the end of the
> mail.
>
>>> haven't looked into it but there's got to be room for optimization
>>> there, surely it only has to do reachability checks for new refs, or
>>> could run in some "I trust this remote not to send me corrupt data"
>>> completely mode (which would make sense within a company where you can
>>> trust your main Git box).
>>
>> No, it's not just about trusting the server side, it's about catching
>> data corruption on the wire as well. We have a trick to avoid
>> reachability check in clone case, which is much more expensive than a
>> fetch. Maybe we could do something further to help the fetch case _if_
>> reachability bitmaps don't help.
>
> Still, if that's indeed a big bottleneck what's the worst-case
> scenario here? That the local repository gets hosed? The server will
> still recursively validate the objects it gets sent, right?
>
> I wonder if a better trade-off in that case would be to skip this in
> some situations and instead put something like "git fsck" in a
> cronjob.
>
> Here's a "git pull" trace mentioned above:
>
> $ time GIT_TRACE=1 git pull
> 13:06:13.603781 git.c:555               trace: exec: 'git-pull'
> 13:06:13.603936 run-command.c:351       trace: run_command: 'git-pull'
> 13:06:13.620615 git.c:349               trace: built-in: git
> 'rev-parse' '--git-dir'
> 13:06:13.631602 git.c:349               trace: built-in: git
> 'rev-parse' '--is-bare-repository'
> 13:06:13.636103 git.c:349               trace: built-in: git
> 'rev-parse' '--show-toplevel'
> 13:06:13.641491 git.c:349               trace: built-in: git 'ls-files' '-u'
> 13:06:13.719923 git.c:349               trace: built-in: git
> 'symbolic-ref' '-q' 'HEAD'
> 13:06:13.728085 git.c:349               trace: built-in: git 'config'
> 'branch.trunk.rebase'
> 13:06:13.738160 git.c:349               trace: built-in: git 'config' 'pull.ff'
> 13:06:13.743286 git.c:349               trace: built-in: git
> 'rev-parse' '-q' '--verify' 'HEAD'
> 13:06:13.972091 git.c:349               trace: built-in: git
> 'rev-parse' '--verify' 'HEAD'
> 13:06:14.149420 git.c:349               trace: built-in: git
> 'update-index' '-q' '--ignore-submodules' '--refresh'
> 13:06:14.294098 git.c:349               trace: built-in: git
> 'diff-files' '--quiet' '--ignore-submodules'
> 13:06:14.467711 git.c:349               trace: built-in: git
> 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
> 13:06:14.683419 git.c:349               trace: built-in: git
> 'rev-parse' '-q' '--git-dir'
> 13:06:15.189707 git.c:349               trace: built-in: git
> 'rev-parse' '-q' '--verify' 'HEAD'
> 13:06:15.335948 git.c:349               trace: built-in: git 'fetch'
> '--update-head-ok'
> 13:06:15.691303 run-command.c:351       trace: run_command: 'ssh'
> 'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\'''
> 13:06:17.095662 run-command.c:351       trace: run_command: 'rev-list'
> '--objects' '--stdin' '--not' '--all' '--quiet'
> remote: Counting objects: 6, done.
> remote: Compressing objects: 100% (6/6), done.
> 3:06:20.426346 run-command.c:351       trace: run_command:
> 'unpack-objects' '--pack_header=2,6'
> 13:06:20.431806 exec_cmd.c:130          trace: exec: 'git'
> 'unpack-objects' '--pack_header=2,6'
> 13:06:20.437343 git.c:349               trace: built-in: git
> 'unpack-objects' '--pack_header=2,6'
> remote: Total 6 (delta 0), reused 0 (delta 0)
> Unpacking objects: 100% (6/6), done.
> 13:06:20.444196 run-command.c:351       trace: run_command: 'rev-list'
> '--objects' '--stdin' '--not' '--all'
> 13:06:20.447135 exec_cmd.c:130          trace: exec: 'git' 'rev-list'
> '--objects' '--stdin' '--not' '--all'
> 13:06:20.451283 git.c:349               trace: built-in: git
> 'rev-list' '--objects' '--stdin' '--not' '--all'
> From ssh://git.example.com/gitrepos/core
>    02d33d2..41e72c4  core      -> origin/core
> 13:06:22.559609 run-command.c:351       trace: run_command: 'gc' '--auto'
> 13:06:22.562176 exec_cmd.c:130          trace: exec: 'git' 'gc' '--auto'
> 13:06:22.565661 git.c:349               trace: built-in: git 'gc' '--auto'
> 13:06:22.594980 git.c:349               trace: built-in: git
> 'rev-parse' '-q' '--verify' 'HEAD'
> 13:06:22.845728 git.c:349               trace: built-in: git
> 'show-branch' '--merge-base' 'refs/heads/core'
> '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
> 13:06:23.087586 git.c:349               trace: built-in: git 'fmt-merge-msg'
> 13:06:23.341451 git.c:349               trace: built-in: git
> 'rev-parse' '--parseopt' '--stuck-long' '--' '--onto'
> '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> 13:06:23.350513 git.c:349               trace: built-in: git
> 'rev-parse' '--git-dir'
> 13:06:23.362011 git.c:349               trace: built-in: git
> 'rev-parse' '--is-bare-repository'
> 13:06:23.365282 git.c:349               trace: built-in: git
> 'rev-parse' '--show-toplevel'
> 13:06:23.372589 git.c:349               trace: built-in: git 'config'
> '--bool' 'rebase.stat'
> 13:06:23.377056 git.c:349               trace: built-in: git 'config'
> '--bool' 'rebase.autostash'
> 13:06:23.382102 git.c:349               trace: built-in: git 'config'
> '--bool' 'rebase.autosquash'
> 13:06:23.389458 git.c:349               trace: built-in: git
> 'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
> 13:06:23.608894 git.c:349               trace: built-in: git
> 'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
> 13:06:23.894026 git.c:349               trace: built-in: git
> 'symbolic-ref' '-q' 'HEAD'
> 13:06:23.898918 git.c:349               trace: built-in: git
> 'rev-parse' '--verify' 'HEAD'
> 13:06:24.102269 git.c:349               trace: built-in: git
> 'rev-parse' '--verify' 'HEAD'
> 13:06:24.338636 git.c:349               trace: built-in: git
> 'update-index' '-q' '--ignore-submodules' '--refresh'
> 13:06:24.539912 git.c:349               trace: built-in: git
> 'diff-files' '--quiet' '--ignore-submodules'
> 13:06:24.729362 git.c:349               trace: built-in: git
> 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
> 13:06:24.938533 git.c:349               trace: built-in: git
> 'merge-base' '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
> 13:06:25.197791 git.c:349               trace: built-in: git 'diff'
> '--stat' '--summary' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
> '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> [details on updated files]
> 13:06:25.488275 git.c:349               trace: built-in: git
> 'checkout' '-q' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
> 13:06:26.467413 git.c:349               trace: built-in: git
> 'update-ref' 'ORIG_HEAD' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
> Fast-forwarded trunk to 41e72c42addc5075e8009a3eebe914fa0ce98b27.
> 13:06:26.716256 git.c:349               trace: built-in: git 'rev-parse' 'HEAD'
> 13:06:26.958595 git.c:349               trace: built-in: git
> 'update-ref' '-m' 'rebase finished: refs/heads/core onto
> 41e72c42addc5075e8009a3eebe914fa0ce98b27' 'refs/heads/core'
> '41e72c42addc5075e8009a3eebe914fa0ce98b27'
> '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
> 13:06:27.205320 git.c:349               trace: built-in: git
> 'symbolic-ref' '-m' 'rebase finished: returning to refs/heads/core'
> 'HEAD' 'refs/heads/core'
> 13:06:27.208748 git.c:349               trace: built-in: git 'gc' '--auto'

I forgot to include that this took:

real    0m13.630s
user    0m10.739s
sys     0m4.064s

on my local laptop with a ssd + hot cache it was:

real    0m7.513s
user    0m3.796s
sys     0m0.624s

So some of that we could speed up with faster systems, but we still
have quite a bit of Git overhead.

Even with the hot cache on the ssd I get on this repo:

$ time (git log -1 >/dev/null)

real    0m0.938s
user    0m0.916s
sys     0m0.020s

v.s. the same on linux.git:

$ time (git log -1 >/dev/null)

real    0m0.016s
user    0m0.008s
sys     0m0.004s

Which I suspect is a function of the high ref count, but it could be
something else...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]