Re: [PATCH 0/2] Fix background maintenance regression in Git 2.45.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/18/24 6:50 PM, Taylor Blau wrote:
On Thu, Jul 18, 2024 at 07:55:44PM +0000, Derrick Stolee via GitGitGadget wrote:
The issue is that 'git multi-pack-index repack' was taught to call 'git
pack-objects' with the new '--stdin-packs' option. However, this changes the
object selection algorithm. Instead of using the objects referenced by the
multi-pack-index, it compares pack-files using a list of "included" and
"excluded" pack-files. This loses some granularity of how the
multi-pack-index chooses among duplicate objects.

Thank you for looking at this so carefully! Let me double check my own
understanding.

Suppose we have a MIDX with some pack 'P' and say, some commit object
'C' which appears in that pack. Let's also suppose we have another pack
'Q' in the same MIDX which also contains 'C', but the MIDX selected its
copy from pack 'P'.

If we want to combine 'P' with some other packs (excluding 'Q'), then
the input to --stdin-packs will look something like:

     P.pack
     ^Q.pack
     ...

And the resulting pack would not contain 'C', since we would reject it
via: add_object_entry_from_pack() -> want_object_in_pack() ->
want_found_object() -> has_object_kept_pack(). The final function there
would find a copy of 'C' in 'Q', and since 'Q' is excluded, we would
reject 'C' as unwanted.

So the resulting pack would not contain 'C', and the MIDX would hold
onto its copy from pack 'P', resulting in 'P' being both (a) in the set
of packs to repack together, but also (b) non-expireable, since it has
at least one object selected from it in the MIDX.

The end result is that some objects that would normally have been included
in the new pack-file are no longer included. The copy that the
multi-pack-index references is in the pack-file that was intended to be
repacked, so that pack-file cannot be expired in the next 'git
multi-pack-index expire' step and is included again in the batch of objects
to repack.

I think this matches my own understanding, but let me know if I'm
missing something. Assuming I'm thinking about this the same way you
are, the fix (stop using --stdin-packss) makes sense to me.

Your interpretation matches mine. Thanks for the careful read.

I think we can accomplish similar goals that match the reasoning for
--stdin-packs (better deltas while also limiting the object walk to the repacked objects) with some changes to read_object_list_from_stdin(),
but that's a more subtle kind of change.

Taking this change as-is will cause a regression in the quality of
delta choices, but recovers our ability to expire "repacked" pack-files
in all cases.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux