On 2024.02.22 16:05, Junio C Hamano wrote: > Josh Steadmon <steadmon@xxxxxxxxxx> writes: > > > At $WORK, we've had a few occasions where someone's commit-graph becomes > > corrupt, and hits various BUG()s that block their day-to-day work. When > > this happens, we advise the user to either disable the commit graph, or > > to delete it and let it be regenerated. > > > > It would be a nicer user experience if we can make this a self-serve > > procedure. To do this, let's add a new `git commit-graph clear` > > subcommand so that users don't need to manually delete files under their > > .git directories. And to make it self-documenting, update various BUG(), > > die(), and error() messages to suggest removing the commit graph to > > recover from the corruption. > > I am of two minds. > > For one, if we know there is a corruption and if we know that we > will certainly recover cleanly if we removed these files, it would > be fair for an end-user to respond with: instead of telling me to > run "commit-graph clear", you can run it for me, can't you? > > The other one is if it hinders debugging the root cause to run > "clear", whether it is done by the end-user or by the mechanism that > detects and dies upon discovery of a corruption. Do we know how > these commit-graph files become corrupt? How valuable would these > corrupt files be to help us track down where the corruption comes > from? If they are not all that useful in debugging, then removing > them ourselves or telling users to remove them may be OK, of course. > > Do these BUG()s come from corruption that can be diagnosed upfront > when we "open" the commit-graph files? I am wondering if it would > be the matter of teaching prepare_commit_graph() to check for > corruption and return without enabling the support. > > Thanks. Sorry for the late reply, this got buried in my inbox. The corruption we saw was related to a generation numbers bug [1] that I think was only present for a short while in 'next'. [1] https://lore.kernel.org/git/YBn3fxFe978Up5Ly@xxxxxxxxxx/ I believe that being able to examine the files after the corruption was detected did help us narrow down the issue, so I would lean towards not automatically deleting them upon detecting corruption. I don't think that this case would be detectable without running a full `git commit-graph verify` up front.