On Tue, Jun 20, 2023 at 07:03:02PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@xxxxxxxxxx> > > In 73320e49add (builtin/repack.c: only collect fully-formed packs, > 2023-06-07), we switched the check for which packs to collect by > starting at the .idx files and looking for matching .pack files. This > avoids trying to repack pack-files that have not had their pack-indexes > installed yet. Right; the motivation behind this change is based on the principle that we consider a packfile (and any corresponding metadata it might have, like .keep, .bitmap, .rev, etc.) to be accessible exactly when there is an .idx file in place. So if pack-P.idx exists, we expect to be able to read the contents of pack-P.pack (and any other pack-P.* files that happen to be laying around). The idea with 73320e49add is to make `collect_pack_filenames()` consistent with the same pack existence check that we use everywhere else. > However, it does cause maintenance to halt if we find the (problematic, > but not insurmountable) case of a .idx file without a corresponding > .pack file. In an environment where packfile maintenance is a critical > function, such a hard stop is costly and requires human intervention to > resolve (by deleting the .idx file). Do all cases fall into the "problematic, but not insurmountable" category? I'm not entirely sure. One instance that is definitely OK is having a partially unstaged pack whose objects were repacked elsewhere, but the 'repack' process doing so died between unlink()-ing the ".pack" file and the ".idx". This matches the patch that you and I discussed yesterday to switch the order and restore git-repack.sh's original behavior, which is correct. But what about someone accidentally deleting a ".pack" file and leaving around its index? Or copying files from one repository to another and only partially copying the contents of $GIT_DIR/objects/pack? What makes me leery about this kind of change is that there are potentially many more cases like the above. And even if the interesting cases are somehow limited to the above, continuing on in the face of a malformed pack group doesn't seem like the right thing for repack to be doing. > This was not the case before. We successfully repacked through this > scenario until the recent change to scan for .idx files. Right, my feeling is that the new behavior post-73320e49add is arguably more correct. > In other cases, Git prepares its list of packfiles by scanning .idx > files and then only adds it to the packfile list if the corresponding > .pack file exists. It even does so without a warning! (See > add_packed_git() in packfile.c for details.) This is by design: we only try to open the pack itself when we call `open_packed_git()`. All packs start out initialized with just their ".idx" as you note. This feels similar to what you are trying to do here, but I do think that it's different. While we're performing object reads, there may be concurrent writers adding (e.g. from pushes) or removing packs (e.g. from repacking) in the repository. So there we have to be resilient to thinking that we might have a copy of some object in a given pack, but still being able to handle the case where that pack disappears and the object is actually found in some other pack. Repacking feels different to me. There we are modifying the packs themselves, so having a incorrectly staged pack group feels like a condition that we should stop on, because we can no longer completely trust the data that we are operating on. > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh > index 08b5ba0c150..e780efe5e0c 100755 > --- a/t/t7700-repack.sh > +++ b/t/t7700-repack.sh > @@ -239,6 +239,10 @@ test_expect_success 'repack --keep-pack' ' > mv "$from" "$to" || return 1 > done && > > + # A .idx file without a .pack should not stop us from > + # repacking what we can. > + touch .git/objects/pack/pack-does-not-exist.idx && > + > git repack --cruft -d --keep-pack $P1 --keep-pack $P4 && > > ls .git/objects/pack/*.pack >newer-counts && If we do end up pursuing this patch, it may be worthwhile to ensure that the dangling .idx file remains in place after the repack is complete. You test for this implicitly below, but doing so here explicitly feels worthwhile to me. Thanks, Taylor