Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/25/2020 3:55 AM, Son Luong Ngoc wrote:
> Hi Taylor,
> 
> Thanks for working on this.
> 
>> On Aug 25, 2020, at 04:01, Taylor Blau <me@xxxxxxxxxxxx> wrote:
>>
>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>> learned to remove a multi-pack-index file if it added or removed a pack
>> from the object store.
>>
>> This mechanism is a little over-eager, since it is only necessary to
>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>> Adding a pack outside of the MIDX does not require invalidating the
>> MIDX, and likewise for removing a pack the MIDX does not know about.
> 
> I wonder if its worth to trigger write_midx_file() to update the midx instead of
> just removing MIDX?

It would be reasonable to have 'git repack' run write_midx_file() after updating
the pack-files, but it still needs to delete the existing multi-pack-index after
deleting a referenced pack, as the current multi-pack-index will be incorrect,
leading to possibly invalid data during the write. 'git multi-pack-index expire'
carefully handles deleting pack-files that have become redundant.

It may be possible to change the behavior of write_midx_file() to check for the
state of each referenced pack-file, but it would then need to re-scan the
pack-indexes to rebuild the set of objects and which pack-files contain them.

> That is already the direction we are taking in the 'maintenance' patch series
> whenever the multi-pack-index file was deemed invalid.

The 'maintenance' builtin definitely adds new MIDX-aware maintenance tasks.
By performing a repack using the multi-pack-index as the starting point, we
can get around these issues.

One thing I've noticed by talking with Taylor about this change is that
'git repack' is a bit like 'git gc' in that it does a LOT of things and it
can be hard to understand exactly when certain sub-tasks are run or not.
There are likely some interesting operations that could be separated into
maintenance tasks. For example, 'git repack -d' is very similar to
'git maintenance run --task=loose-objects'.

> Or perhaps, we can check for 'core.multiPackIndex' value (which recently is 
> 'true' by default) and determine whether we should remove the MIDX or rewrite
> it?

We currently rewrite it during 'git repack' when GIT_TEST_MULTI_PACK_INDEX=1
to maximize how often we have a multi-pack-index in the test suite. It would
be simple to update that to also check a config option. I don't think adding
that behavior to 'core.multiPackIndex' is a good direction, though.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux