On Tue, Aug 25, 2020 at 08:58:46AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > Does "git repack" ever remove just one pack? Obviously "git repack -ad" > > or "git repack -Ad" is going to pack everything and delete the old > > packs. So I think we'd want to remove a midx there. > > AFAIK, the pack-redundant subcommand is used by nobody in-tree, so > nobody is doing "there are three packfiles, but all the objects in > one of them are contained in the other two, so let's remove that > redundant one". Not AFAICT either. > > And "git repack -d" I think of as deleting only loose objects that we > > just packed. But I guess it could also remove a pack that has now been > > made redundant? That seems like a rare case in practice, but I suppose > > is possible. > > Meaning it can become reality? Yes. Or it already can happen? I > doubt it. > > > E.g., imagine we have a midx that points to packs A and B, and > > git-repack deletes B. By your logic above, we need to remove the midx > > because now it points to objects in B which aren't accessible. But by > > deleting it, could we be deleting the only thing that mentions the > > objects in A? > > > > I _think_ the answer is "no", because we never went all-in on midx and > > allowed deleting the matching .idx files for contained packs. > > Yeah, it has been my assumption that that part of the design would > never change. > > > I'm also a little curious how bad it is to have a midx whose pack has > > gone away. I guess we'd answer queries for "yes, we have this object" > > even if we don't, which is bad. Though in practice we'd only delete > > those packs if we have their objects elsewhere. > > Hmph, object O used to be in pack A and pack B, midx points at the > copy in pack B but we deleted it because the pack is redundant. > Now, midx says O exists, but midx cannot be used to retrieve O. We > need to ask A.idx about O to locate it. > > That sounds brittle. I am not sure our error fallback is all that > patient. Me either. Like I said, I think that all of this is possible at least in theory, but in practice it may be somewhat annoying in cases like these. > > > And the pack code is > > pretty good about retrying other copies of objects that can't be > > accessed. Alternatively, I wonder if the midx-loading code ought to > > check that all of the constituent packs are available. > > > > In that line of thinking, do we even need to delete midx files if one of > > their packs goes away? The reading side probably ought to be able to > > handle that gracefully. > > But at that point, is there even a point to have that midx file that > knows about objects (so it can answer has_object()? correctly and > quickly) but does not know the correct location of half of the objects? > Instead of going directly to A.idx to locate O, we need to go to midx > to learn the location of O in B (which no longer exists), and then > fall back to it, that is a pure overhead. Well put. > > And the more interesting case is when you repack everything with "-ad" > > or similar, at which point you shouldn't even need to look up what's in > > the midx to see if you deleted its packs. The point of your operation is > > to put it all-into-one, so you know the old midx should be discarded. > > Old one, yes. Do we need to have the new one in that case? > > >> Teach 'git repack' to check for this by loading the MIDX, and checking > >> whether the to-be-removed pack is known to the MIDX. This requires a > >> slightly odd alternation to a test in t5319, which is explained with a > >> comment. > > > > My above musings aside, this seems like an obvious improvement. > > Yes. > > >> diff --git a/builtin/repack.c b/builtin/repack.c > >> index 04c5ceaf7e..98fac03946 100644 > >> --- a/builtin/repack.c > >> +++ b/builtin/repack.c > >> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list, > >> static void remove_redundant_pack(const char *dir_name, const char *base_name) > >> { > >> struct strbuf buf = STRBUF_INIT; > >> - strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name); > >> + struct multi_pack_index *m = get_multi_pack_index(the_repository); > >> + strbuf_addf(&buf, "%s.pack", base_name); > >> + if (m && midx_contains_pack(m, buf.buf)) > >> + clear_midx_file(the_repository); > >> + strbuf_insertf(&buf, 0, "%s/", dir_name); > > > > Makes sense. midx_contains_pack() is a binary search, so we'll spend > > O(n log n) effort deleting the packs (I wondered if this might be > > accidentally quadratic over the number of packs). > > > > And after we clear, "m" will be NULL, so we'll do it at most once. Which > > is why you can get rid of the manual "midx_cleared" flag from the > > preimage. > > > > So the patch looks good to me. > > Thanks. Thanks, both. If you're ready, let's use the version in [1] for queueing. Thanks, Taylor [1]: https://lore.kernel.org/git/87a3b7a5a2f091e2a23c163a7d86effbbbedfa3a.1598371475.git.me@xxxxxxxxxxxx/