On 8/25/2020 3:55 AM, Son Luong Ngoc wrote: > Hi Taylor, > > Thanks for working on this. > >> On Aug 25, 2020, at 04:01, Taylor Blau <me@xxxxxxxxxxxx> wrote: >> >> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack' >> learned to remove a multi-pack-index file if it added or removed a pack >> from the object store. >> >> This mechanism is a little over-eager, since it is only necessary to >> drop a MIDX if 'git repack' removes a pack that the MIDX references. >> Adding a pack outside of the MIDX does not require invalidating the >> MIDX, and likewise for removing a pack the MIDX does not know about. > > I wonder if its worth to trigger write_midx_file() to update the midx instead of > just removing MIDX? It would be reasonable to have 'git repack' run write_midx_file() after updating the pack-files, but it still needs to delete the existing multi-pack-index after deleting a referenced pack, as the current multi-pack-index will be incorrect, leading to possibly invalid data during the write. 'git multi-pack-index expire' carefully handles deleting pack-files that have become redundant. It may be possible to change the behavior of write_midx_file() to check for the state of each referenced pack-file, but it would then need to re-scan the pack-indexes to rebuild the set of objects and which pack-files contain them. > That is already the direction we are taking in the 'maintenance' patch series > whenever the multi-pack-index file was deemed invalid. The 'maintenance' builtin definitely adds new MIDX-aware maintenance tasks. By performing a repack using the multi-pack-index as the starting point, we can get around these issues. One thing I've noticed by talking with Taylor about this change is that 'git repack' is a bit like 'git gc' in that it does a LOT of things and it can be hard to understand exactly when certain sub-tasks are run or not. There are likely some interesting operations that could be separated into maintenance tasks. For example, 'git repack -d' is very similar to 'git maintenance run --task=loose-objects'. > Or perhaps, we can check for 'core.multiPackIndex' value (which recently is > 'true' by default) and determine whether we should remove the MIDX or rewrite > it? We currently rewrite it during 'git repack' when GIT_TEST_MULTI_PACK_INDEX=1 to maximize how often we have a multi-pack-index in the test suite. It would be simple to update that to also check a config option. I don't think adding that behavior to 'core.multiPackIndex' is a good direction, though. Thanks, -Stolee