On Thu, Oct 19, 2023 at 08:45:34AM +0200, Patrick Steinhardt wrote: > On Tue, Oct 17, 2023 at 11:34:53AM -0700, Junio C Hamano wrote: > > Patrick Steinhardt <ps@xxxxxx> writes: > > > > > Fair point indeed. The following is a worst-case scenario benchmark of > > > of the change where we do a full topological walk of all reachable > > > commits in the graph, executed in linux.git. We parse commit parents via > > > `repo_parse_commit_gently()`, so the new code path now basically has to > > > check for object existence of every reachable commit: > > > ... > > > The added check does lead to a performance regression indeed, which is > > > not all that unexpected. That being said, the commit-graph still results > > > in a significant speedup compared to the case where we don't have it. > > > > Yeah, I agree that both points are expected. An extra check that is > > much cheaper than the full parsing is paying a small price to be a > > bit more careful than before. The question is if the price is small > > enough. I am still not sure if the extra carefulness is warranted > > for all normal cases to spend 30% extra cycles. > > > > Thanks. > > Well, seen in contexts like the thread that spawned this discussion I > think it's preferable to take a relatively smaller price compared to > disabling the commit graph entirely in some situations. With that in > mind, personally I'd be happy with either of two outcomes here: > > - We take the patch I proposed as a hardening measure, which allows > us to use the commit graph in basically any situation where we > could parse objects from the ODB without any downside except for a > performance hit. > > - We say that corrupt repositories do not need to be accounted for > when using metadata caches like the commit-graph. That would mean > in consequence that for the `--missing` flag we would not have to > disable commit graphs. > > The test failure that was exposed in Karthik's test only stems from the > fact that we manually corrupt the repository and is not specific to the > `--missing` flag. This is further stressed by the new test aded in my > own patch, as we can trigger a similar bug when not using `--missing`. There's another way to handle this, which is to conditionally enable the object existence check. This would be less of a performance hit compared to disabling commit graphs altogether with `--missing`, but still ensure that repository corruption was detected. Second, it would not regress performance for all preexisting users of `repo_parse_commit_gently()`. Patrick
Attachment:
signature.asc
Description: PGP signature