On Tue, Oct 17, 2023 at 11:34:53AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > Fair point indeed. The following is a worst-case scenario benchmark of > > of the change where we do a full topological walk of all reachable > > commits in the graph, executed in linux.git. We parse commit parents via > > `repo_parse_commit_gently()`, so the new code path now basically has to > > check for object existence of every reachable commit: > > ... > > The added check does lead to a performance regression indeed, which is > > not all that unexpected. That being said, the commit-graph still results > > in a significant speedup compared to the case where we don't have it. > > Yeah, I agree that both points are expected. An extra check that is > much cheaper than the full parsing is paying a small price to be a > bit more careful than before. The question is if the price is small > enough. I am still not sure if the extra carefulness is warranted > for all normal cases to spend 30% extra cycles. > > Thanks. Well, seen in contexts like the thread that spawned this discussion I think it's preferable to take a relatively smaller price compared to disabling the commit graph entirely in some situations. With that in mind, personally I'd be happy with either of two outcomes here: - We take the patch I proposed as a hardening measure, which allows us to use the commit graph in basically any situation where we could parse objects from the ODB without any downside except for a performance hit. - We say that corrupt repositories do not need to be accounted for when using metadata caches like the commit-graph. That would mean in consequence that for the `--missing` flag we would not have to disable commit graphs. The test failure that was exposed in Karthik's test only stems from the fact that we manually corrupt the repository and is not specific to the `--missing` flag. This is further stressed by the new test aded in my own patch, as we can trigger a similar bug when not using `--missing`. For end users, object pruning would happen either via git-gc(1) or via git-maintenance(1), and both of them do know to update commit graphs already. This should significantly decrease the chance that such stale commit graphs would be found out there in the wild, but it's still not impossible of course. Especially in cases where you use lower-level commands like git-repack(1) with cruft packs or git-prune(1), which is often the case for Git hosters, the risk is higher though. Patrick
Attachment:
signature.asc
Description: PGP signature