Hi, While analyzing push performance on gitlab.com, I've been at times wondering what git-receive-pack(1) is doing for so long. For some repos which have loads of references (~880k), even tiny pushes of less than 10 objects took dozens of seconds to get accepted. One of the issues I've found is the object connectivity check, which may run for a significant amount of time. The root cause here is that we're computing connectivity via `git rev-list --not --all`: if we've got many refs in the repository, computing `--not --all` is hugely expensive. This commit series thus implements an alternative way of computing reachability, which reuses information from the object quarantine environment. Instead of doing a refwalk, we just look iterate over all packed and loose quarantined objects any for each of them, we determine whether their immediate references are all satisfied. This reimplementation is paying out quite well for repos which have many refs. The following benchmarks for git-receive-pack(1) (added in patch 2/8) have been performed in linux-stable.git: Test v2.32.0-rc0 HEAD -------------------------------------------------------------------------------------------- 5400.3: receive-pack clone create 1.27(1.11+0.16) 0.02(0.01+0.01) -98.4% 5400.5: receive-pack clone update 1.27(1.13+0.13) 0.02(0.02+0.00) -98.4% 5400.7: receive-pack clone reset 0.13(0.11+0.02) 0.02(0.01+0.01) -84.6% 5400.9: receive-pack clone delete 0.02(0.01+0.01) 0.03(0.02+0.01) +50.0% 5400.11: receive-pack extrarefs create 33.01(18.80+14.43) 9.00(4.30+4.65) -72.7% 5400.13: receive-pack extrarefs update 33.13(18.85+14.50) 9.01(4.28+4.67) -72.8% 5400.15: receive-pack extrarefs reset 32.90(18.82+14.32) 9.04(4.26+4.77) -72.5% 5400.17: receive-pack extrarefs delete 9.13(4.35+4.77) 8.94(4.29+4.64) -2.1% 5400.19: receive-pack empty create 223.35(640.63+127.74) 227.55(651.75+130.94) +1.9% These rather clearly show that the previous rev-walk has been a major bottleneck in the implementation. Patrick Patrick Steinhardt (8): perf: fix when running with TEST_OUTPUT_DIRECTORY p5400: add perf tests for git-receive-pack(1) tmp-objdir: expose function to retrieve path packfile: have `for_each_file_in_pack_dir()` return error codes object-file: allow reading loose objects without reading their contents connected: implement connectivity check via temporary object dirs receive-pack: skip connectivity checks on delete-only commands receive-pack: check connectivity via quarantined objects builtin/receive-pack.c | 57 +++++++---- connected.c | 192 +++++++++++++++++++++++++++++++++++ connected.h | 19 ++++ midx.c | 22 ++-- object-file.c | 9 +- packfile.c | 26 +++-- packfile.h | 10 +- t/perf/aggregate.perl | 8 +- t/perf/p5400-receive-pack.sh | 74 ++++++++++++++ t/perf/perf-lib.sh | 4 +- t/perf/run | 25 +++-- tmp-objdir.c | 7 ++ tmp-objdir.h | 5 + 13 files changed, 401 insertions(+), 57 deletions(-) create mode 100755 t/perf/p5400-receive-pack.sh -- 2.31.1
Attachment:
signature.asc
Description: PGP signature