Hi, I finally found some time again to have another look at my old problem of slow connectivity checks. After my previous two approaches of using the quarantine directory and using bitmaps proved to not really be viable, I've taken a step back yet again. The result is this series, which speeds up the connectivity checks by optimizing "revison.c". More specifically, I'm mostly tweaking how we're queueing up references, which is the most pressing issue we've observed at GitLab when doing connectivity checks in repos with many references. The following optimizations are part of this series. All benchmarks were done on [1], which is a repository with about 2.2 million references (even though most of them are hidden to public users) with `git rev-list --objects --quiet --unsorted-input --not --all --not $newrev`. 1. We used to sort the input references in git-rev-list(1). This is moot in the context of connectivity checks, so a new flag suppresses this sorting. This improves the command by ~30% from 7.6s to 4.9s. 2. We did some busy-work, loading each reference twice via `get_reference()`. We now don't anymore, resulting in a ~8% speedup from 5.0s to 4.6s. 3. An optimization was done to how we load objects. Previously, we always called `oid_object_info()`, even if we had already loaded the object. This was tweaked to use `lookup_unknown_object()`, which is a performance-memory tradeoff. This saves us another 7%, going from 4.7s to 4.4s, but it's a prereq for (4). 4. We now make better use of the commit-graph in that we first try loading from there before we load it from the ODB. This is a 40% speedup, going from 4.4s to 2.8s. The result is a speedup of about 65%. The nice thing compared to previous versions is that this should also be visible when directly executing git-rev-list(1) or doing a revwalk. Patch #1 still needs some polishing if we agree that this patch series makes sense, given that it's still missing documentation. Patrick [1]: https://gitlab.com/gitlab-org/gitlab.git Patrick Steinhardt (4): connected: do not sort input revisions revision: stop retrieving reference twice revision: avoid loading object headers multiple times revision: avoid hitting packfiles when commits are in commit-graph commit-graph.c | 55 +++++++++++++++++++++++++++++++++++++++----------- commit-graph.h | 2 ++ connected.c | 1 + revision.c | 23 ++++++++++++++++----- revision.h | 1 + 5 files changed, 65 insertions(+), 17 deletions(-) -- 2.32.0
Attachment:
signature.asc
Description: PGP signature