On 8/25/2020 11:58 AM, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > >> Does "git repack" ever remove just one pack? Obviously "git repack -ad" >> or "git repack -Ad" is going to pack everything and delete the old >> packs. So I think we'd want to remove a midx there. > > AFAIK, the pack-redundant subcommand is used by nobody in-tree, so > nobody is doing "there are three packfiles, but all the objects in > one of them are contained in the other two, so let's remove that > redundant one". > >> And "git repack -d" I think of as deleting only loose objects that we >> just packed. But I guess it could also remove a pack that has now been >> made redundant? That seems like a rare case in practice, but I suppose >> is possible. > > Meaning it can become reality? Yes. Or it already can happen? I > doubt it. > >> E.g., imagine we have a midx that points to packs A and B, and >> git-repack deletes B. By your logic above, we need to remove the midx >> because now it points to objects in B which aren't accessible. But by >> deleting it, could we be deleting the only thing that mentions the >> objects in A? >> >> I _think_ the answer is "no", because we never went all-in on midx and >> allowed deleting the matching .idx files for contained packs. > > Yeah, it has been my assumption that that part of the design would > never change. Yes, we should intend to keep the .idx files around forever even when using the multi-pack-index. The duplicated data overhead is not typically a real problem. The one caveat I would consider is if we wanted to let a multi-pack-index cover thin packs, but that would be a substantial change to the object database that I do not plan to tackle any time soon. >> I'm also a little curious how bad it is to have a midx whose pack has >> gone away. I guess we'd answer queries for "yes, we have this object" >> even if we don't, which is bad. Though in practice we'd only delete >> those packs if we have their objects elsewhere. > > Hmph, object O used to be in pack A and pack B, midx points at the > copy in pack B but we deleted it because the pack is redundant. > Now, midx says O exists, but midx cannot be used to retrieve O. We > need to ask A.idx about O to locate it. > > That sounds brittle. I am not sure our error fallback is all that > patient. The best I can see is that prepare_midx_pack() will return 1 if the pack no longer exists, and this would cause a die("error preparing packfile from multi-pack-index") in nth_midxed_pack_entry(), killing the following stack trace: + find_pack_entry():packfile.c + fill_midx_entry():midx.c + nth_midxed_pack_entry():midx.c Perhaps that die() is a bit over-zealous since we could return 1 through this stack and find_pack_entry() could continue searching for the object in the packed_git list. However, it could start returning false _negatives_ if there were duplicates of the object in the multi-pack-index but only the latest copy was deleted (and the object does not appear in a pack-file outside of the multi- pack-index). Thanks, -Stolee