Re: [PATCH v2 19/19] midx: implement support for writing incremental MIDX chains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 17, 2024 at 05:12:53PM -0400, Taylor Blau wrote:

> The implementation for doing so is relatively straightforward, and boils
> down to a handful of different kinds of changes implemented in this
> patch:
> 
>   - The `compute_sorted_entries()` function is taught to reject objects
>     which appear in any existing MIDX layer.

OK, I think this is one part I was looking for earlier but didn't see.
The implementation looks pretty easy (we can always ask about just the
earlier layers by feeding the base_midx pointer to midx_has_oid(), etc).

>   - Functions like `write_midx_revindex()` are adjusted to write
>     pack_order values which are offset by the number of objects in the
>     base MIDX layer.
> 
>   - The end of `write_midx_internal()` is adjusted to move
>     non-incremental MIDX files when necessary (i.e. when creating an
>     incremental chain with an existing non-incremental MIDX in the
>     repository).
> 
> There are a handful of other changes that are introduced, like new
> functions to clear incremental MIDX files that are unrelated to the
> current chain (using the same "keep_hash" mechanism as in the
> non-incremental case).

That all makes sense. I wondered a bit about selection of packs, size of
incremental, etc. We'd probably want a geometric-ish progression, just
like with packs, to balance cost of generation versus cost of lookups.
But I guess we get that for free to some degree with "repack
--geometric", assuming that our incremental midx is just covering the
new packfiles.

> The tests explicitly exercising the new incremental MIDX feature are
> relatively limited for two reasons:
> 
>   1. Most of the "interesting" behavior is already thoroughly covered in
>      t5319-multi-pack-index.sh, which handles the core logic of reading
>      objects through a MIDX.
> 
>      The new tests in t5334-incremental-multi-pack-index.sh are mostly
>      focused on creating and destroying incremental MIDXs, as well as
>      stitching their results together across layers.

Do you mean here that t5319 will get coverage when
GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL is set? In the long run, I
wonder if we should pull t5319's tests into lib-midx.sh and run them in
incremental and non-incremental modes.

>   2. A new GIT_TEST environment variable is added called
>      "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
>      entire test suite to write incremental MIDXs after repacking when
>      combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.
> 
>      This exercises the long tail of other interesting behavior that is
>      defined implicitly throughout the rest of the CI suite. It is
>      likewise added to the linux-TEST-vars job.

Makes sense.

> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 8f34f05087..be1188e736 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -7,6 +7,9 @@ test_description='git repack works correctly'
>  . "${TEST_DIRECTORY}/lib-midx.sh"
>  . "${TEST_DIRECTORY}/lib-terminal.sh"
>  
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL=0
> [...]
> -		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
> +		git repack -Adl --write-bitmap-index 2>err &&

This pulls the GIT_TEST_MULTI_PACK_INDEX=0 out of the individual tests
and sets it for the whole file. Are we losing some coverage for the
other tests? I doubt it's that big a deal either way (and I can
certainly see the argument that t7700, which is concerned with the
details of repacking, should be in control of the details of midx
generation). But I wonder if just setting WRITE_INCREMENTAL=0 would be
enough?

The rest of the changes all looked pretty reasonable, though again, it's
easy to review code you wrote and say "that looks good" but not realize
any gotchas that both of us missed.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux