Hi, this is version 2 of my patch series to more readily detect commits parsed from the commit graph which are missing in the object database. I've split this out into a separate thread as version 1 was sent in reply to a different series to extend git-rev-list(1)'s `--missing` option so that I don't continue to hijack this thread. Changes compared to v1: - I've added a preparatory patch that introduced a new GIT_COMMIT_GRAPH_PARANOIA environment variable as suggested by Peff. This envvar is retrofitted to the preexisting check in `lookup_commit_in_graph()` so that the behaviour for this sanity check is consistent. - `repo_parse_commit_internal()` now also honors this new envvar. - I've amended the commit message in v2 to include the benchmark that demonstrates the performance regression. - We now un-parse the commit when parsing it via the commit graph succeeded, but it doesn't exist in the ODB. Thanks for all the feedback and discussion around this. Patrick [1]: <b0bf576c51a706367a758b8e30eca37edb9c2734.1697200576.git.ps@xxxxxx> Patrick Steinhardt (2): commit-graph: introduce envvar to disable commit existence checks commit: detect commits that exist in commit-graph but not in the ODB Documentation/git.txt | 9 ++++++++ commit-graph.c | 6 +++++- commit-graph.h | 6 ++++++ commit.c | 16 +++++++++++++- t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 83 insertions(+), 2 deletions(-) Range-diff against v1: -: ---------- > 1: a89c435528 commit-graph: introduce envvar to disable commit existence checks 1: 6ec1e340f8 ! 2: 0476d48555 commit: detect commits that exist in commit-graph but not in the ODB @@ Commit message behaviour by checking for object existence via the object database, as well. + This check of course comes with a performance penalty. The following + benchmarks have been executed in a clone of linux.git with stable tags + added: + + Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) + Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] + Range (min … max): 2.894 s … 2.950 s 10 runs + + Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] + Range (min … max): 3.780 s … 3.961 s 10 runs + + Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) + Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] + Range (min … max): 13.714 s … 13.995 s 10 runs + + Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] + Range (min … max): 13.645 s … 14.038 s 10 runs + + Summary + git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran + 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) + 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) + + We look at a ~30% regression in general, but in general we're still a + whole lot faster than without the commit graph. To counteract this, the + new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. + Signed-off-by: Patrick Steinhardt <ps@xxxxxx> ## commit.c ## +@@ + #include "shallow.h" + #include "tree.h" + #include "hook.h" ++#include "parse.h" + + static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); + @@ commit.c: int repo_parse_commit_internal(struct repository *r, return -1; if (item->object.parsed) return 0; - if (use_commit_graph && parse_commit_in_graph(r, item)) + if (use_commit_graph && parse_commit_in_graph(r, item)) { -+ if (!has_object(r, &item->object.oid, 0)) ++ static int object_paranoia = -1; ++ ++ if (object_paranoia == -1) ++ object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1); ++ ++ if (object_paranoia && !has_object(r, &item->object.oid, 0)) { ++ unparse_commit(r, &item->object.oid); + return quiet_on_missing ? -1 : + error(_("commit %s exists in commit-graph but not in the object database"), + oid_to_hex(&item->object.oid)); ++ } ++ return 0; + } @@ commit.c: int repo_parse_commit_internal(struct repository *r, return quiet_on_missing ? -1 : ## t/t5318-commit-graph.sh ## -@@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version upgrade' ' +@@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when given directly' ' ) ' -+test_expect_success 'commit exists in commit-graph but not in object database' ' ++test_expect_success 'stale commit cannot be parsed when traversing graph' ' + test_when_finished "rm -rf repo" && + git init repo && + ( @@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version + oid=$(git rev-parse B) && + rm .git/objects/"$(test_oid_to_path "$oid")" && + ++ # Again, we should be able to parse the commit when not ++ # being paranoid about commit graph staleness... ++ GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 && ++ # ... but fail when we are paranoid. + test_must_fail git rev-parse HEAD~2 2>error && + grep "error: commit $oid exists in commit-graph but not in the object database" error + ) -- 2.42.0
Attachment:
signature.asc
Description: PGP signature