Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 25, 2020 at 08:58:46AM -0700, Junio C Hamano wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
> > Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> > or "git repack -Ad" is going to pack everything and delete the old
> > packs. So I think we'd want to remove a midx there.
>
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".

Not AFAICT either.

> > And "git repack -d" I think of as deleting only loose objects that we
> > just packed. But I guess it could also remove a pack that has now been
> > made redundant? That seems like a rare case in practice, but I suppose
> > is possible.
>
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.
>
> > E.g., imagine we have a midx that points to packs A and B, and
> > git-repack deletes B. By your logic above, we need to remove the midx
> > because now it points to objects in B which aren't accessible. But by
> > deleting it, could we be deleting the only thing that mentions the
> > objects in A?
> >
> > I _think_ the answer is "no", because we never went all-in on midx and
> > allowed deleting the matching .idx files for contained packs.
>
> Yeah, it has been my assumption that that part of the design would
> never change.
>
> > I'm also a little curious how bad it is to have a midx whose pack has
> > gone away. I guess we'd answer queries for "yes, we have this object"
> > even if we don't, which is bad. Though in practice we'd only delete
> > those packs if we have their objects elsewhere.
>
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
>
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

Me either. Like I said, I think that all of this is possible at least in
theory, but in practice it may be somewhat annoying in cases like these.

>
> > And the pack code is
> > pretty good about retrying other copies of objects that can't be
> > accessed. Alternatively, I wonder if the midx-loading code ought to
> > check that all of the constituent packs are available.
> >
> > In that line of thinking, do we even need to delete midx files if one of
> > their packs goes away? The reading side probably ought to be able to
> > handle that gracefully.
>
> But at that point, is there even a point to have that midx file that
> knows about objects (so it can answer has_object()? correctly and
> quickly) but does not know the correct location of half of the objects?
> Instead of going directly to A.idx to locate O, we need to go to midx
> to learn the location of O in B (which no longer exists), and then
> fall back to it, that is a pure overhead.

Well put.

> > And the more interesting case is when you repack everything with "-ad"
> > or similar, at which point you shouldn't even need to look up what's in
> > the midx to see if you deleted its packs. The point of your operation is
> > to put it all-into-one, so you know the old midx should be discarded.
>
> Old one, yes.  Do we need to have the new one in that case?
>
> >> Teach 'git repack' to check for this by loading the MIDX, and checking
> >> whether the to-be-removed pack is known to the MIDX. This requires a
> >> slightly odd alternation to a test in t5319, which is explained with a
> >> comment.
> >
> > My above musings aside, this seems like an obvious improvement.
>
> Yes.
>
> >> diff --git a/builtin/repack.c b/builtin/repack.c
> >> index 04c5ceaf7e..98fac03946 100644
> >> --- a/builtin/repack.c
> >> +++ b/builtin/repack.c
> >> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >>  {
> >>  	struct strbuf buf = STRBUF_INIT;
> >> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> >> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> >> +	strbuf_addf(&buf, "%s.pack", base_name);
> >> +	if (m && midx_contains_pack(m, buf.buf))
> >> +		clear_midx_file(the_repository);
> >> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
> >
> > Makes sense. midx_contains_pack() is a binary search, so we'll spend
> > O(n log n) effort deleting the packs (I wondered if this might be
> > accidentally quadratic over the number of packs).
> >
> > And after we clear, "m" will be NULL, so we'll do it at most once. Which
> > is why you can get rid of the manual "midx_cleared" flag from the
> > preimage.
> >
> > So the patch looks good to me.
>
> Thanks.

Thanks, both. If you're ready, let's use the version in [1] for
queueing.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/87a3b7a5a2f091e2a23c163a7d86effbbbedfa3a.1598371475.git.me@xxxxxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux