On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason > <avarab@xxxxxxxxx> wrote: >> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's: >> >> * Around 500k commits >> * Around 100k tags >> * Around 5k branches >> * Around 500 commits/day, almost entirely to the same branch >> * 1.5 GB .git checkout. >> * Mostly text source, but some binaries (we're trying to cut down[1] on those) > > Would be nice if you could make an anonymized version of this repo > public. Working on a "real" large repo is better than an artificial > one. Yeah, I'll try to do that. >> But actually most of "git fetch" is spent in the reachability check >> subsequently done by "git-rev-list" which takes several seconds. I > > I wonder if reachability bitmap could help here.. I could have sworn I had that enabled already but evidently not. I did test it and it cut down on clone times a bit. Now our daily repacking is: git --git-dir={} gc && git --git-dir={} pack-refs --all --prune && git --git-dir={} repack -Ad --window=250 --depth=100 --write-bitmap-index --pack-kept-objects && It's not clear to me from the documentation whether this should just be enabled on the server, or the clients too. In any case I've enabled it on both. Even then with it enabled on both a "git pull" that pulls down just one commit on one branch is 13s. Trace attached at the end of the mail. >> haven't looked into it but there's got to be room for optimization >> there, surely it only has to do reachability checks for new refs, or >> could run in some "I trust this remote not to send me corrupt data" >> completely mode (which would make sense within a company where you can >> trust your main Git box). > > No, it's not just about trusting the server side, it's about catching > data corruption on the wire as well. We have a trick to avoid > reachability check in clone case, which is much more expensive than a > fetch. Maybe we could do something further to help the fetch case _if_ > reachability bitmaps don't help. Still, if that's indeed a big bottleneck what's the worst-case scenario here? That the local repository gets hosed? The server will still recursively validate the objects it gets sent, right? I wonder if a better trade-off in that case would be to skip this in some situations and instead put something like "git fsck" in a cronjob. Here's a "git pull" trace mentioned above: $ time GIT_TRACE=1 git pull 13:06:13.603781 git.c:555 trace: exec: 'git-pull' 13:06:13.603936 run-command.c:351 trace: run_command: 'git-pull' 13:06:13.620615 git.c:349 trace: built-in: git 'rev-parse' '--git-dir' 13:06:13.631602 git.c:349 trace: built-in: git 'rev-parse' '--is-bare-repository' 13:06:13.636103 git.c:349 trace: built-in: git 'rev-parse' '--show-toplevel' 13:06:13.641491 git.c:349 trace: built-in: git 'ls-files' '-u' 13:06:13.719923 git.c:349 trace: built-in: git 'symbolic-ref' '-q' 'HEAD' 13:06:13.728085 git.c:349 trace: built-in: git 'config' 'branch.trunk.rebase' 13:06:13.738160 git.c:349 trace: built-in: git 'config' 'pull.ff' 13:06:13.743286 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:13.972091 git.c:349 trace: built-in: git 'rev-parse' '--verify' 'HEAD' 13:06:14.149420 git.c:349 trace: built-in: git 'update-index' '-q' '--ignore-submodules' '--refresh' 13:06:14.294098 git.c:349 trace: built-in: git 'diff-files' '--quiet' '--ignore-submodules' 13:06:14.467711 git.c:349 trace: built-in: git 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--' 13:06:14.683419 git.c:349 trace: built-in: git 'rev-parse' '-q' '--git-dir' 13:06:15.189707 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:15.335948 git.c:349 trace: built-in: git 'fetch' '--update-head-ok' 13:06:15.691303 run-command.c:351 trace: run_command: 'ssh' 'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\''' 13:06:17.095662 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' remote: Counting objects: 6, done. remote: Compressing objects: 100% (6/6), done. 3:06:20.426346 run-command.c:351 trace: run_command: 'unpack-objects' '--pack_header=2,6' 13:06:20.431806 exec_cmd.c:130 trace: exec: 'git' 'unpack-objects' '--pack_header=2,6' 13:06:20.437343 git.c:349 trace: built-in: git 'unpack-objects' '--pack_header=2,6' remote: Total 6 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (6/6), done. 13:06:20.444196 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' 13:06:20.447135 exec_cmd.c:130 trace: exec: 'git' 'rev-list' '--objects' '--stdin' '--not' '--all' 13:06:20.451283 git.c:349 trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' >From ssh://git.example.com/gitrepos/core 02d33d2..41e72c4 core -> origin/core 13:06:22.559609 run-command.c:351 trace: run_command: 'gc' '--auto' 13:06:22.562176 exec_cmd.c:130 trace: exec: 'git' 'gc' '--auto' 13:06:22.565661 git.c:349 trace: built-in: git 'gc' '--auto' 13:06:22.594980 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:22.845728 git.c:349 trace: built-in: git 'show-branch' '--merge-base' 'refs/heads/core' '41e72c42addc5075e8009a3eebe914fa0ce98b27' '02d33d2be7f8601c3502fdd89b0946447d7cdf15' 13:06:23.087586 git.c:349 trace: built-in: git 'fmt-merge-msg' 13:06:23.341451 git.c:349 trace: built-in: git 'rev-parse' '--parseopt' '--stuck-long' '--' '--onto' '41e72c42addc5075e8009a3eebe914fa0ce98b27' '41e72c42addc5075e8009a3eebe914fa0ce98b27' 13:06:23.350513 git.c:349 trace: built-in: git 'rev-parse' '--git-dir' 13:06:23.362011 git.c:349 trace: built-in: git 'rev-parse' '--is-bare-repository' 13:06:23.365282 git.c:349 trace: built-in: git 'rev-parse' '--show-toplevel' 13:06:23.372589 git.c:349 trace: built-in: git 'config' '--bool' 'rebase.stat' 13:06:23.377056 git.c:349 trace: built-in: git 'config' '--bool' 'rebase.autostash' 13:06:23.382102 git.c:349 trace: built-in: git 'config' '--bool' 'rebase.autosquash' 13:06:23.389458 git.c:349 trace: built-in: git 'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0' 13:06:23.608894 git.c:349 trace: built-in: git 'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0' 13:06:23.894026 git.c:349 trace: built-in: git 'symbolic-ref' '-q' 'HEAD' 13:06:23.898918 git.c:349 trace: built-in: git 'rev-parse' '--verify' 'HEAD' 13:06:24.102269 git.c:349 trace: built-in: git 'rev-parse' '--verify' 'HEAD' 13:06:24.338636 git.c:349 trace: built-in: git 'update-index' '-q' '--ignore-submodules' '--refresh' 13:06:24.539912 git.c:349 trace: built-in: git 'diff-files' '--quiet' '--ignore-submodules' 13:06:24.729362 git.c:349 trace: built-in: git 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--' 13:06:24.938533 git.c:349 trace: built-in: git 'merge-base' '41e72c42addc5075e8009a3eebe914fa0ce98b27' '02d33d2be7f8601c3502fdd89b0946447d7cdf15' 13:06:25.197791 git.c:349 trace: built-in: git 'diff' '--stat' '--summary' '02d33d2be7f8601c3502fdd89b0946447d7cdf15' '41e72c42addc5075e8009a3eebe914fa0ce98b27' [details on updated files] 13:06:25.488275 git.c:349 trace: built-in: git 'checkout' '-q' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0' 13:06:26.467413 git.c:349 trace: built-in: git 'update-ref' 'ORIG_HEAD' '02d33d2be7f8601c3502fdd89b0946447d7cdf15' Fast-forwarded trunk to 41e72c42addc5075e8009a3eebe914fa0ce98b27. 13:06:26.716256 git.c:349 trace: built-in: git 'rev-parse' 'HEAD' 13:06:26.958595 git.c:349 trace: built-in: git 'update-ref' '-m' 'rebase finished: refs/heads/core onto 41e72c42addc5075e8009a3eebe914fa0ce98b27' 'refs/heads/core' '41e72c42addc5075e8009a3eebe914fa0ce98b27' '02d33d2be7f8601c3502fdd89b0946447d7cdf15' 13:06:27.205320 git.c:349 trace: built-in: git 'symbolic-ref' '-m' 'rebase finished: returning to refs/heads/core' 'HEAD' 'refs/heads/core' 13:06:27.208748 git.c:349 trace: built-in: git 'gc' '--auto' -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html