Re: [PATCH] commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default

Jeff King <peff@xxxxxxxx> · Wed, 6 Dec 2023 14:49:52 -0500

On Wed, Nov 15, 2023 at 02:35:16PM +0100, Patrick Steinhardt wrote:

> > > Just to make sure we do not miscommunicate, I do not think we want
> > > to trigger the paranoia mode only in our tests.  We want to be
> > > paranoid to help real users who used "--missing" for their real use,
> > > so enabling PARANOIA in the test script is a wrong approach.  We
> > > should enable it inside "rev-list --missing" codepath.
> > 
> > Yeah. Just like we auto-enabled GIT_REF_PARANOIA for git-gc, etc, I
> > think we should do the same here.
> 
> I'm honestly still torn on this one. There are two cases that I can
> think of where missing objects would be benign and where one wants to
> use `git rev-list --missing`:
> 
>     - Repositories with promisor remotes, to find the boundary of where
>       we need to fetch new objects.
> 
>     - Quarantine directories where you only intend to list new objects
>       or find the boundary.
> 
> And in neither of those cases I can see a path for how the commit-graph
> would contain such missing commits when using regular tooling to perform
> repository maintenance.

Sorry for being unclear (and for the very slow response!). What I meant
by "here" was not "rev-list --missing" in particular, but rather that
"here" is "GIT_COMMIT_GRAPH_PARANOIA". And like GIT_REF_PARANOIA, we
should make sure it is turned on when checking for repository
corruption.

So specifically I meant that turning it off should:

  1. Not cause us to miss corruption with fsck.

  2. Not cause us to make corruption worse during a destructive repack
     (e.g., "repack -ad").

  3. Not admit corruption into the repository by fooling the rev-list
     invocation for check_connected().

I don't think the third one uses --missing at all, but even if it did,
the interesting thing to me is not "--missing", but rather that the
caller knows it is doing a corruption check. So it would set the
environment variable itself.

So in your loosening patch, I would have expected to see a couple of
those cases overriding the new "default to off" behavior. But it may be
that they are not necessary (e.g., I think fsck may turn off the commit
graph entirely already).

> So I'm still not sure why we think that this case is so much more
> special than others. If a user wants to check for repository corruption
> the tool shouldn't be `git rev-list --missing`, but git-fsck(1). To me,
> the former is only useful in very specific circumstances where the user
> knows what they are doing, and in none of the usecases I can think of
> should we have a stale commit-graph _unless_ we have actual repository
> corruption.
> 
> In reverse, to me this means that `--missing` is no more special than
> any of the other low-level tooling, where our stance seems to be "We
> assume that the repository is not corrupt". In that spirit, I'd argue
> that the same default value should apply here as for all the other
> cases.

Yeah, I think we are on the same page here. The need for paranoia is
really in the eyes of the caller, because only they know how careful
they want the operation to be.

> Oh, well. I'll wait for answers to this reply until tomorrow, and if I
> still haven't been able to convince anybody then I'll assume it's just
> me and adapt accordingly :)

Sorry, better late than never. ;)

-Peff