Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/25/2020 11:58 AM, Junio C Hamano wrote:
> Jeff King <peff@xxxxxxxx> writes:
> 
>> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
>> or "git repack -Ad" is going to pack everything and delete the old
>> packs. So I think we'd want to remove a midx there.
> 
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".
> 
>> And "git repack -d" I think of as deleting only loose objects that we
>> just packed. But I guess it could also remove a pack that has now been
>> made redundant? That seems like a rare case in practice, but I suppose
>> is possible.
> 
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.
> 
>> E.g., imagine we have a midx that points to packs A and B, and
>> git-repack deletes B. By your logic above, we need to remove the midx
>> because now it points to objects in B which aren't accessible. But by
>> deleting it, could we be deleting the only thing that mentions the
>> objects in A?
>>
>> I _think_ the answer is "no", because we never went all-in on midx and
>> allowed deleting the matching .idx files for contained packs.
> 
> Yeah, it has been my assumption that that part of the design would
> never change.

Yes, we should intend to keep the .idx files around forever even when
using the multi-pack-index. The duplicated data overhead is not typically
a real problem.

The one caveat I would consider is if we wanted to let a multi-pack-index
cover thin packs, but that would be a substantial change to the object
database that I do not plan to tackle any time soon.

>> I'm also a little curious how bad it is to have a midx whose pack has
>> gone away. I guess we'd answer queries for "yes, we have this object"
>> even if we don't, which is bad. Though in practice we'd only delete
>> those packs if we have their objects elsewhere.
> 
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
> 
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

The best I can see is that prepare_midx_pack() will return 1 if the
pack no longer exists, and this would cause a die("error preparing
packfile from multi-pack-index") in nth_midxed_pack_entry(), killing
the following stack trace:

+ find_pack_entry():packfile.c
 + fill_midx_entry():midx.c
  + nth_midxed_pack_entry():midx.c

Perhaps that die() is a bit over-zealous since we could return 1
through this stack and find_pack_entry() could continue searching
for the object in the packed_git list. However, it could start
returning false _negatives_ if there were duplicates of the object
in the multi-pack-index but only the latest copy was deleted (and
the object does not appear in a pack-file outside of the multi-
pack-index).

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux