Patrick Steinhardt <ps@xxxxxx> writes: > actually exists in the object database. It's the only callsite of that > function outside of "commit-graph.c", as all other external callers > would call `lookup_commit_in_graph()` which _does_ perform the object > existence check. > > So I think that the proper way to address the regression would be a > patch similar to the following: > > diff --git a/commit.c b/commit.c > index b3223478bc..109e9217e3 100644 > --- a/commit.c > +++ b/commit.c > @@ -572,8 +572,13 @@ int repo_parse_commit_internal(struct repository *r, > return -1; > if (item->object.parsed) > return 0; > - if (use_commit_graph && parse_commit_in_graph(r, item)) > + if (use_commit_graph && parse_commit_in_graph(r, item)) { > + if (!has_object(r, &item->object.oid, 0)) > + return quiet_on_missing ? -1 : > + error(_("commit %s exists in commit-graph but not in the object database"), > + oid_to_hex(&item->object.oid)); > return 0; > + } It is (overly) conservative, which I generally should find pleasing, but as I said, for secondary information like commit-graph that is derived from the primary source only to precompute for optimization, our general attitude should be to trust it and let the optimization kick in, unless the operation being performed primarily cares about the case where the result from using and not using the secondary source differs. An obvious example of such an operation is "fsck", where we do care and want to notice when the underlying object graph and what commit-graph records contradict with each other. And my suggestion to disable commit-graph while running the "rev-list" command with the "--missing" option is because that usage would fall into the same category (please correct me if that is not the case) [*]. So for the purpose of "rev-list --missing", the above change does take us closer to the answer we would get from the primary source, but isn't it pessimising other users unnecessarily? Do we have a rough idea (if we have a benchmark that would be even better) how much this lack of has_object() check is contributing to the performance benefit of using commit-graph? Majority of callers of this code should not have to pay the additional overhead here, so unless it is small enough, this would be pessimising the generic code for everybody to please "rev-list --missing", which is why I am worried about going in this direction.