On Mon, Mar 28, 2022 at 01:55:43PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > To summarize Jonathan's point (as I think I severely misunderstood it > > before), if two writers are repacking a repository with unreachable > > objects. The following can happen: > > > > - $NEWGIT packs the repository and writes a cruft pack and .mtimes > > file. > > > > - $OLDGIT packs the repository, exploding unreachable objects from the > > cruft pack as loose, setting their mtimes to "now". > > And if these repeat, alternating new and old versions of Git, we > will keep refreshing the unreachable objects' mtimes forever. > > But once you stop using old versions of Git, perhaps in 3 release > cycles or so, we'll eventually be able to purge them, right? As soon as all of the repackers understand cruft packs, yes. > > One approach (that Jonathan suggested) is to prevent the above situation > > by introducing a format extension, so $OLDGIT could not touch the > > repository. But this comes at a (in my view, significant) cost which is > > that $OLDGIT can't touch the repository _at all_. An extension would be > > desirable if cross-version interaction resulted in repository > > corruption, but this scenario does not lead to corruption at all. > > A repository may not be in a healthy state, when tons of unreachable > objects stay around forever, but it probably is a bit too harsh to > call it "corrupt". I agree, though I would note that this is no worse than the situation today, where unreachable-but-recent objects are already exploded as loose can already cause the kinds of issues that this series is designed to prevent. > > Another approach (courtesy Stolee, in an off-list discussion) is that we > > could introduce an optional extension available as an opt-in to prevent > > older versions of Git from interacting in a repository that contains > > cruft packs, but is not required to write them. > > That smells too magic; let's not go there. I'm not sure... if we did: --- 8< --- diff --git a/setup.c b/setup.c index 04ce33cdcd..fa54c9baa4 100644 --- a/setup.c +++ b/setup.c @@ -565,2 +565,4 @@ static enum extension_result handle_extension(const char *var, return EXTENSION_OK; + } else if (!strcmp(ext, "cruftpacks")) { + return EXTENSION_OK; } --- >8 --- but nothing more, then a hypothetical `extensions.cruftPacks` could be used to prevent older writers in a mixed version environment. But if you don't have or care about older versions of Git, you can avoid setting it altogether. The key bit is that we don't have a check along the lines of "only allow writing a cruft pack when extensions.cruftPacks" is set, so it's opt-in as far as the new code is concerned. > > A third approach (and probably my preferred direction) is to indicate > > clearly via a combination of updates to Documentation/cruft-packs.txt > > and the release notes that say something along the lines of: > > > > If you use are repacking a repository using both a pre- and > > post-cruft packs version of Git, please be aware that you will lose > > information about the mtimes of unreachable objects. > > I do not quite see how it helps. After hearing "... will lose > information about the mtimes ...", what concrete action can a user > take? Or a sys-admin? > > It's not like use of cruft-pack is mandatory when you upgrade the > new version of Git, right? Perhaps use of cruft-pack should be > guarded behind a configuration variable so that users who might want > to use mixed versions of Git will be protected against accidental > use of new version of Git that introduces the forever-renewing > untracked objects problem? I don't think we would have much to offer a user in that case; if the mtimes are gone, then I couldn't think of anything to bring them back outside of setting them manually. But cruft packs are already guarded in two places: - `git repack` won't write a cruft pack unless given the `--cruft` flag (i.e., `git repack -A` doesn't suddenly start generating cruft packs upon upgrade). - `git gc` won't write cruft packs unless the `gc.cruftPacks` configuration is set, or `--cruft` is given as a flag. I'd be curious what Jonathan and others think of that approach (which, to be clear, is what this series already implements). We could make it clear to say: If you have mixed versions of Git which both repack a repository (either manually or by auto-GC / background maintenance), consider leaving `gc.cruftPacks` unset and avoiding passing `--cruft` as a command-line argument to `git repack` and `git gc`, since doing so can lead to [...] > Perhaps a configuration variable, repack.cruftPackEnabled, that is > by default disabled, can be used to protect people who do not want > to get into the "keep refreshing mtime" loop from using the cruft > packs by mistake? repack.cruftPackEnabled can probably be part of > the "experimental" feature set, if we think it is the direction in > the future. I'd probably want to leave `-A` separate from `--cruft`, since something about setting `repack.cruftPackEnabled` having the effect of causing `-A` to produce a cruft pack feels strange to me. Thanks, Taylor