On Wed, Jul 17, 2024 at 05:12:53PM -0400, Taylor Blau wrote: > The implementation for doing so is relatively straightforward, and boils > down to a handful of different kinds of changes implemented in this > patch: > > - The `compute_sorted_entries()` function is taught to reject objects > which appear in any existing MIDX layer. OK, I think this is one part I was looking for earlier but didn't see. The implementation looks pretty easy (we can always ask about just the earlier layers by feeding the base_midx pointer to midx_has_oid(), etc). > - Functions like `write_midx_revindex()` are adjusted to write > pack_order values which are offset by the number of objects in the > base MIDX layer. > > - The end of `write_midx_internal()` is adjusted to move > non-incremental MIDX files when necessary (i.e. when creating an > incremental chain with an existing non-incremental MIDX in the > repository). > > There are a handful of other changes that are introduced, like new > functions to clear incremental MIDX files that are unrelated to the > current chain (using the same "keep_hash" mechanism as in the > non-incremental case). That all makes sense. I wondered a bit about selection of packs, size of incremental, etc. We'd probably want a geometric-ish progression, just like with packs, to balance cost of generation versus cost of lookups. But I guess we get that for free to some degree with "repack --geometric", assuming that our incremental midx is just covering the new packfiles. > The tests explicitly exercising the new incremental MIDX feature are > relatively limited for two reasons: > > 1. Most of the "interesting" behavior is already thoroughly covered in > t5319-multi-pack-index.sh, which handles the core logic of reading > objects through a MIDX. > > The new tests in t5334-incremental-multi-pack-index.sh are mostly > focused on creating and destroying incremental MIDXs, as well as > stitching their results together across layers. Do you mean here that t5319 will get coverage when GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL is set? In the long run, I wonder if we should pull t5319's tests into lib-midx.sh and run them in incremental and non-incremental modes. > 2. A new GIT_TEST environment variable is added called > "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the > entire test suite to write incremental MIDXs after repacking when > combined with the "GIT_TEST_MULTI_PACK_INDEX" variable. > > This exercises the long tail of other interesting behavior that is > defined implicitly throughout the rest of the CI suite. It is > likewise added to the linux-TEST-vars job. Makes sense. > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh > index 8f34f05087..be1188e736 100755 > --- a/t/t7700-repack.sh > +++ b/t/t7700-repack.sh > @@ -7,6 +7,9 @@ test_description='git repack works correctly' > . "${TEST_DIRECTORY}/lib-midx.sh" > . "${TEST_DIRECTORY}/lib-terminal.sh" > > +GIT_TEST_MULTI_PACK_INDEX=0 > +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0 > [...] > - GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err && > + git repack -Adl --write-bitmap-index 2>err && This pulls the GIT_TEST_MULTI_PACK_INDEX=0 out of the individual tests and sets it for the whole file. Are we losing some coverage for the other tests? I doubt it's that big a deal either way (and I can certainly see the argument that t7700, which is concerned with the details of repacking, should be in control of the details of midx generation). But I wonder if just setting WRITE_INCREMENTAL=0 would be enough? The rest of the changes all looked pretty reasonable, though again, it's easy to review code you wrote and say "that looks good" but not realize any gotchas that both of us missed. -Peff