Patrick Steinhardt <ps@xxxxxx> writes: > Commit graphs can become stale and contain references to commits that do > not exist in the object database anymore. Theoretically, this can lead > to a scenario where we are able to successfully look up any such commit > via the commit graph even though such a lookup would fail if done via > the object database directly. > > As the commit graph is mostly intended as a sort of cache to speed up > parsing of commits we do not want to have diverging behaviour in a > repository with and a repository without commit graphs, no matter > whether they are stale or not. As commits are otherwise immutable, the > only thing that we really need to care about is thus the presence or > absence of a commit. > > To address potentially stale commit data that may exist in the graph, > our `lookup_commit_in_graph()` function will check for the commit's > existence in both the commit graph, but also in the object database. So > even if we were able to look up the commit's data in the graph, we would > still pretend as if the commit didn't exist if it is missing in the > object database. > > We don't have the same safety net in `parse_commit_in_graph_one()` > though. This function is mostly used internally in "commit-graph.c" > itself to validate the commit graph, and this usage is fine. We do > expose its functionality via `parse_commit_in_graph()` though, which > gets called by `repo_parse_commit_internal()`, and that function is in > turn used in many places in our codebase. > > For all I can see this function is never used to directly turn an object > ID into a commit object without additional safety checks before or after > this lookup. What it is being used for though is to walk history via the > parent chain of commits. So when commits in the parent chain of a graph > walk are missing it is possible that we wouldn't notice if that missing > commit was part of the commit graph. Thus, a query like `git rev-parse > HEAD~2` can succeed even if the intermittent commit is missing. > > It's unclear whether there are additional ways in which such stale > commit graphs can lead to problems. In any case, it feels like this is a > bigger bug waiting to happen when we gain additional direct or indirect > callers of `repo_parse_commit_internal()`. So let's fix the inconsistent > behaviour by checking for object existence via the object database, as > well. > > This check of course comes with a performance penalty. The following > benchmarks have been executed in a clone of linux.git with stable tags > added: > > Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) > Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] > Range (min … max): 2.894 s … 2.950 s 10 runs > > Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) > Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] > Range (min … max): 3.780 s … 3.961 s 10 runs > > Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) > Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] > Range (min … max): 13.714 s … 13.995 s 10 runs > > Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) > Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] > Range (min … max): 13.645 s … 14.038 s 10 runs > > Summary > git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran > 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) > 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) > 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) > > We look at a ~30% regression in general, but in general we're still a > whole lot faster than without the commit graph. To counteract this, the > new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Very nicely described. Will queue. I'll go offline for the rest of the week but if there are no significant issues discovered by the time I come back, let's declare a victory and merge these two patches down to 'next'. Thanks. > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > commit.c | 16 +++++++++++++++- > t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+), 1 deletion(-) > > diff --git a/commit.c b/commit.c > index b3223478bc..7399e90212 100644 > --- a/commit.c > +++ b/commit.c > @@ -28,6 +28,7 @@ > #include "shallow.h" > #include "tree.h" > #include "hook.h" > +#include "parse.h" > > static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **); > > @@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r, > return -1; > if (item->object.parsed) > return 0; > - if (use_commit_graph && parse_commit_in_graph(r, item)) > + if (use_commit_graph && parse_commit_in_graph(r, item)) { > + static int object_paranoia = -1; > + > + if (object_paranoia == -1) > + object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1); > + > + if (object_paranoia && !has_object(r, &item->object.oid, 0)) { > + unparse_commit(r, &item->object.oid); > + return quiet_on_missing ? -1 : > + error(_("commit %s exists in commit-graph but not in the object database"), > + oid_to_hex(&item->object.oid)); > + } > + > return 0; > + } > > if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0) > return quiet_on_missing ? -1 : > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index c0cc454538..79467d7926 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' ' > ) > ' > > +test_expect_success 'stale commit cannot be parsed when traversing graph' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + > + test_commit A && > + test_commit B && > + test_commit C && > + git commit-graph write --reachable && > + > + # Corrupt the repository by deleting the intermittent commit > + # object. Commands should notice that this object is absent and > + # thus that the repository is corrupt even if the commit graph > + # exists. > + oid=$(git rev-parse B) && > + rm .git/objects/"$(test_oid_to_path "$oid")" && > + > + # Again, we should be able to parse the commit when not > + # being paranoid about commit graph staleness... > + GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 && > + # ... but fail when we are paranoid. > + test_must_fail git rev-parse HEAD~2 2>error && > + grep "error: commit $oid exists in commit-graph but not in the object database" error > + ) > +' > + > test_done