Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> or "git repack -Ad" is going to pack everything and delete the old
> packs. So I think we'd want to remove a midx there.

AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
nobody is doing "there are three packfiles, but all the objects in
one of them are contained in the other two, so let's remove that
redundant one".

> And "git repack -d" I think of as deleting only loose objects that we
> just packed. But I guess it could also remove a pack that has now been
> made redundant? That seems like a rare case in practice, but I suppose
> is possible.

Meaning it can become reality?  Yes.  Or it already can happen?  I
doubt it.

> E.g., imagine we have a midx that points to packs A and B, and
> git-repack deletes B. By your logic above, we need to remove the midx
> because now it points to objects in B which aren't accessible. But by
> deleting it, could we be deleting the only thing that mentions the
> objects in A?
>
> I _think_ the answer is "no", because we never went all-in on midx and
> allowed deleting the matching .idx files for contained packs.

Yeah, it has been my assumption that that part of the design would
never change.

> I'm also a little curious how bad it is to have a midx whose pack has
> gone away. I guess we'd answer queries for "yes, we have this object"
> even if we don't, which is bad. Though in practice we'd only delete
> those packs if we have their objects elsewhere.

Hmph, object O used to be in pack A and pack B, midx points at the
copy in pack B but we deleted it because the pack is redundant.
Now, midx says O exists, but midx cannot be used to retrieve O.  We
need to ask A.idx about O to locate it.

That sounds brittle.  I am not sure our error fallback is all that
patient.


> And the pack code is
> pretty good about retrying other copies of objects that can't be
> accessed. Alternatively, I wonder if the midx-loading code ought to
> check that all of the constituent packs are available.
>
> In that line of thinking, do we even need to delete midx files if one of
> their packs goes away? The reading side probably ought to be able to
> handle that gracefully.

But at that point, is there even a point to have that midx file that
knows about objects (so it can answer has_object()? correctly and
quickly) but does not know the correct location of half of the objects?
Instead of going directly to A.idx to locate O, we need to go to midx
to learn the location of O in B (which no longer exists), and then
fall back to it, that is a pure overhead.

> And the more interesting case is when you repack everything with "-ad"
> or similar, at which point you shouldn't even need to look up what's in
> the midx to see if you deleted its packs. The point of your operation is
> to put it all-into-one, so you know the old midx should be discarded.

Old one, yes.  Do we need to have the new one in that case?

>> Teach 'git repack' to check for this by loading the MIDX, and checking
>> whether the to-be-removed pack is known to the MIDX. This requires a
>> slightly odd alternation to a test in t5319, which is explained with a
>> comment.
>
> My above musings aside, this seems like an obvious improvement.

Yes.

>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index 04c5ceaf7e..98fac03946 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>>  {
>>  	struct strbuf buf = STRBUF_INIT;
>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
>> +	strbuf_addf(&buf, "%s.pack", base_name);
>> +	if (m && midx_contains_pack(m, buf.buf))
>> +		clear_midx_file(the_repository);
>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>
> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> O(n log n) effort deleting the packs (I wondered if this might be
> accidentally quadratic over the number of packs).
>
> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> is why you can get rid of the manual "midx_cleared" flag from the
> preimage.
>
> So the patch looks good to me.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux