On 12/15/2021 2:46 PM, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > >> Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by >> which they can become corrupt when permuting their pack order. ... >> So I'm definitely open to suggestions there, but otherwise this series should go >> a long ways towards fixing my design mistake of having the MIDX .rev file be >> separate from the MIDX itself. > > Yeah, a single file with different chunks is a good way to ensure > atomicity of update. I just reviewed this series for the first time. Sorry for being so late getting to it. I had a few minor recommendations but they are mostly nitpicks and don't deserve holding up the series if there are no other major comments. > A note to reviewers. > > We need to make sure that not just we can still read .rev in > existing repositories (and convert it to the new chunk) correctly, > but also decide what to do to older versions of Git once the > repository is touched by this new version. Would they be upset to > see no .rev files or is it just the performance thing (and it is > more correct to recompute the reverse index on the fly)? Should the > new chunk be made mandatory to cause them notice that they should > not muck with the repository, or is it optional? Things like that. 1. Can we still read a .rev? The new test script specifically verifies that existing repositories will continue to read their .rev upon upgrade. Their .rev files will be replaced with the chunk during the next write. 2. What if they downgrade after the RIDX chunk is in place? The .rev file is missing and the repo has a performance issue because they can't use bitmaps. Correctness is not a problem. Anyone using .rev files for server use (where bitmaps are most useful) is hopefully already careful about downgrading Git versions. 3. Should the chunk be made mandatory? Unfortunately, the chunk format did not follow the index format's example of making lowercase chunk IDs required. Instead, the chunks that are necessary for v1 are necessary forever and all other chunks are deemed optional. Changing this would require something more drastic like updating the version number or giving some grace period where released versions start treating lowercase chunk IDs as required before creating a new "required" chunk. This does mean that if there is a version incompatibility, the RIDX chunk will just be ignored by the older version of Git. In terms of making this a safe format upgrade, I think Taylor has achieved that. The only thing I can think is that server operators might want to deploy this version with GIT_TEST_MIDX_WRITE_REV=1 for a while, so any need to downgrade would not suffer a performance penalty for a missing .rev file. If that is a planned way to safely deploy this change, then it might be worth adding a test that we safely delete a .rev file after writing both a .rev file and a RIDX chunk. (The RIDX chunk will be preferred, so maybe the previous .rev file hits some logic that would skip its deletion?) Thanks, -Stolee