On 2/2/2021 8:08 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: > >> There is a subtle failure happening when computing corrected commit >> dates with --split enabled. It requires a base layer needing the >> generation_data_overflow chunk. Then, the next layer on top >> erroneously thinks it needs an overflow chunk due to a bug leading >> to recalculating all reachable generation numbers. The output of >> the failure is >> >> BUG: commit-graph.c:1912: expected to write 8 bytes to >> chunk 47444f56, but wrote 0 instead > > At Google, we're running into a commit-graph issue that appears to > have also arrived as part of this last week's rollout. This one is a > bit worse --- it is reproducible for affected users and stops them > from being able to do day-to-day development: You're shipping 'next' widely? I appreciate the extra eyes on early bits, so we can find more issues and get them resolved. > $ git pull > remote: Finding sources: 100% (33/33) > remote: Total 33 (delta 18), reused 33 (delta 18) > Unpacking objects: 100% (33/33), 27.44 KiB | 460.00 KiB/s, done. > From https://example.com/path/to/repo > 05ba0d775..279e4e6d0 master -> origin/master > BUG: commit-reach.c:64: bad generation skip 29e3 > 628 at 62abdabd1be00ebadbf73061ecf72b35042338e3 > error: merge died of signal 6 > > "git commit-graph verify" agrees that the generation numbers are wrong: > > $ git commit-graph verify > commit-graph generation for commit 4290b2214cdf50263118322735347d151715a272 is 3 != 1586 > Verifying commits in commit graph: 100% (1/1), done. > commit-graph generation for commit b6c73a8472c7cb503cce0668849150a4b4329230 is 1576 != 10724 > Verifying commits in commit graph: 100% (10/10), done. > Verifying commits in commit graph: 100% (88/88), done. > Verifying commits in commit graph: 100% (208/208), done. > Verifying commits in commit graph: 100% (592/592), done. > Verifying commits in commit graph: 100% (1567/1567), done. > Verifying commits in commit graph: 100% (8358/8358), done. > > We have some examples of repositories that were corrupted this way, > but we didn't catch them in the act of corruption --- it started > happening to several users with this release so we immediately rolled > back. It is definitely related to the split commit-graph during the upgrade scenario. Your verify output shows that you are using the --split option heavily (possibly with fetch.writeCommitGraph? or are you using 'git maintenance run --task=commit-graph'?) > Questions: > > - is this likely to be due to the same cause, or is it orthogonal? My guess is that the reason is the same. I think that you might have some strangeness of a commit-graph layer with corrected commit dates being below a commit-graph layer without it. > - what is the recommended way to recover from this state? "git fsck" > shows the repositories to have no problems. "git help commit-graph" > doesn't show a command for users to use; is > `rm -fr .git/objects/info/commit-graphs/` the recommended recovery > command? That, followed by `git commit-graph write --reachable [--changed-paths]` depending on what they want. > - is there configuration or a patch we can roll out to help affected > users recover from this state? If you are willing, then take v2 of this series and follow through by clearing the commit-graph files of affected users. Note that you can be proactive using `git commit-graph verify` to see who needs rewrites. Thanks, -Stolee