Re: [PATCH v3 01/17] Documentation/technical: add cruft-packs.txt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 28, 2022 at 01:55:43PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> > To summarize Jonathan's point (as I think I severely misunderstood it
> > before), if two writers are repacking a repository with unreachable
> > objects. The following can happen:
> >
> >   - $NEWGIT packs the repository and writes a cruft pack and .mtimes
> >     file.
> >
> >   - $OLDGIT packs the repository, exploding unreachable objects from the
> >     cruft pack as loose, setting their mtimes to "now".
>
> And if these repeat, alternating new and old versions of Git, we
> will keep refreshing the unreachable objects' mtimes forever.
>
> But once you stop using old versions of Git, perhaps in 3 release
> cycles or so, we'll eventually be able to purge them, right?

As soon as all of the repackers understand cruft packs, yes.

> > One approach (that Jonathan suggested) is to prevent the above situation
> > by introducing a format extension, so $OLDGIT could not touch the
> > repository. But this comes at a (in my view, significant) cost which is
> > that $OLDGIT can't touch the repository _at all_. An extension would be
> > desirable if cross-version interaction resulted in repository
> > corruption, but this scenario does not lead to corruption at all.
>
> A repository may not be in a healthy state, when tons of unreachable
> objects stay around forever, but it probably is a bit too harsh to
> call it "corrupt".

I agree, though I would note that this is no worse than the situation
today, where unreachable-but-recent objects are already exploded as
loose can already cause the kinds of issues that this series is designed
to prevent.

> > Another approach (courtesy Stolee, in an off-list discussion) is that we
> > could introduce an optional extension available as an opt-in to prevent
> > older versions of Git from interacting in a repository that contains
> > cruft packs, but is not required to write them.
>
> That smells too magic; let's not go there.

I'm not sure... if we did:

--- 8< ---

diff --git a/setup.c b/setup.c
index 04ce33cdcd..fa54c9baa4 100644
--- a/setup.c
+++ b/setup.c
@@ -565,2 +565,4 @@ static enum extension_result handle_extension(const char *var,
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "cruftpacks")) {
+		return EXTENSION_OK;
 	}

--- >8 ---

but nothing more, then a hypothetical `extensions.cruftPacks` could be
used to prevent older writers in a mixed version environment. But if you
don't have or care about older versions of Git, you can avoid setting it
altogether.

The key bit is that we don't have a check along the lines of "only allow
writing a cruft pack when extensions.cruftPacks" is set, so it's opt-in
as far as the new code is concerned.

> > A third approach (and probably my preferred direction) is to indicate
> > clearly via a combination of updates to Documentation/cruft-packs.txt
> > and the release notes that say something along the lines of:
> >
> >     If you use are repacking a repository using both a pre- and
> >     post-cruft packs version of Git, please be aware that you will lose
> >     information about the mtimes of unreachable objects.
>
> I do not quite see how it helps.  After hearing "... will lose
> information about the mtimes ...", what concrete action can a user
> take?  Or a sys-admin?
>
> It's not like use of cruft-pack is mandatory when you upgrade the
> new version of Git, right?  Perhaps use of cruft-pack should be
> guarded behind a configuration variable so that users who might want
> to use mixed versions of Git will be protected against accidental
> use of new version of Git that introduces the forever-renewing
> untracked objects problem?

I don't think we would have much to offer a user in that case; if the
mtimes are gone, then I couldn't think of anything to bring them back
outside of setting them manually.

But cruft packs are already guarded in two places:

  - `git repack` won't write a cruft pack unless given the `--cruft`
    flag (i.e., `git repack -A` doesn't suddenly start generating cruft
    packs upon upgrade).

  - `git gc` won't write cruft packs unless the `gc.cruftPacks`
    configuration is set, or `--cruft` is given as a flag.

I'd be curious what Jonathan and others think of that approach (which,
to be clear, is what this series already implements). We could make it
clear to say:

    If you have mixed versions of Git which both repack a repository
    (either manually or by auto-GC / background maintenance), consider
    leaving `gc.cruftPacks` unset and avoiding passing `--cruft` as a
    command-line argument to `git repack` and `git gc`, since doing so
    can lead to [...]

> Perhaps a configuration variable, repack.cruftPackEnabled, that is
> by default disabled, can be used to protect people who do not want
> to get into the "keep refreshing mtime" loop from using the cruft
> packs by mistake?  repack.cruftPackEnabled can probably be part of
> the "experimental" feature set, if we think it is the direction in
> the future.

I'd probably want to leave `-A` separate from `--cruft`, since something
about setting `repack.cruftPackEnabled` having the effect of causing
`-A` to produce a cruft pack feels strange to me.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux