Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1

Jeff King <peff@xxxxxxxx> · Wed, 22 Nov 2017 17:36:14 -0500

On Wed, Nov 22, 2017 at 10:42:30AM +0900, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > I'm not sure what the right behavior is, but I'm pretty sure that's not
> > it. Probably one of:
> >
> >   - skip updating the ref when we see the breakage
> >
> >   - ditto, but terminate the whole operation, since we might be deleting
> >     other refs and in a broken repo we're probably best to make as few
> >     changes as possible
> >
> >   - behave as if it was a non-ff, which would allow "--force" to
> >     overwrite the broken ref. Maybe convenient for fixing things, but
> >     possibly surprising (and it's not that hard to just delete the
> >     broken refs manually before proceeding).
> 
> Perhaps the last one would be the ideal endgame, but the second one
> may be a good stopping point in the shorter term.

This turns out to be a lot trickier than I expected. The crux of the
matter is that the case we care about is hidden inside
lookup_commit_reference_gently(), which doesn't distinguish between
corruption and "not a commit".

So there are four cases we care about for this call in fetch:

  1. We fed a real sha1 and got a commit (or peeled to one).

  2. We fed a real sha1 which resolved to a non-commit, and we got NULL.

  3. We fed a real sha1 and the object was missing or corrupted, and we
     got NULL.

  4. We fed a null sha1 and got NULL.

Right now we lump cases 2-4 together as "do not do a fast-forward
check". That's fine for 2 and 4, but probably not for 3. We can easily
catch case 4 ourselves (if we care to), but distinguishing case 3 from
the others is hard. How should lookup_commit_reference_gently() signal
it to us?

Or should lookup_commit_reference_gently() die on corruption? That's not
very "gentle", but I think the "gently" here is really about "it might
not be a commit", not "the repo might be corrupted". But I think even
that may be the tip of the iceberg. The next thing we do is feed the
commits to in_merge_bases(), which will happily return "nope" if the old
commit cannot be parsed (because it has only a boolean return value).

So I dunno. Maybe it is a losing battle to try to pass this kind of
corruption information up the stack.  I'm tempted to say that there
should just be a "paranoid" flag to globally die() whenever we see a
corruption (and you could run with it normally, but relax it whenever
you're investigating a broken repo). But I doubt even that works. Not
having the "old_oid" object at all would be a repo corruption here, but
how are the low-level routines supposed to know when a missing object is
a corruption and when it is not?

-Peff