Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 19 Jul 2018 13:49:02 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Tue, 17 Jul 2018, Junio C Hamano wrote:
>
>> Jeff King <peff@xxxxxxxx> writes:
>> 
>> > I'm OK with having a partial fix, or one that fixes immediate pain
>> > without doing a big cleanup, as long as it doesn't make anything _worse_
>> > in a user-visible way. And that's really my question: is pruning here
>> > going to bite people unexpectedly (not rhetorical -- I really don't
>> > know)?
>> 
>> Yeah, that matches the general guideline I follow when reviewing a
>> patch that claims to make existing things better.  And I do not
>> think I can explain to a third person why pruning here is a good
>> idea and won't cause problems, after seeing these patches and
>> the discussion from the sideline.
>
> It is very easy to explain: `git repack` can drop unreachable commits
> without further warning, making the corresponding entries in
> `.git/shallow` invalid, which causes serious problems when deepening the
> branches.

That explains why you wrote the patch very clearly.

> The solution is easy: drop also the now-invalid entries in `.git/shallow`
> after dropping unreachable commits unceremoniously.

Sorry, but I do not think I can relay that as an explanation why it
won't cause problems to a third person.

The entries in shallow file says that history behind them may not
exist in the repository due to its shallowness but history after
them are supposed to be traversable (otherwise we have a repository
corruption).  It is true that an entry that itself no longer exists
in this repository should not be in shallow file, as the presence of
that entry breaks that promise the file is making---that commit
ought to exist and it is safe to traverse down to it, so keeping the
entry in the file is absolutely a wrong thing to do.

But that does not automatically mean that just simply removing it
makes the resulting repository good, does it?  Wouldn't the solution
for that corruption be to set a new entry to stop history traversal
before reaching that (now-missing) commit?  If your solution and
explanatoin were like that, then I can understand why it won't cause
problems, because the resulting repository would be shallower than
it originally was, as if you cloned with a smaller depth, but it is
not corrupt.

Perhaps your rationale is that by trading one shape of corrupt
repository (where a commit that does not even exist is in the
shallow file, breaking the early start-up sequence when it tries to
read the entries) with another shape of corrupt repsitory (where
shallow does not completely tell where to stop, and traversing the
history can eventually hit a missing commit because no entry in the
shallow file stops such a traversal), at least deepening fetch can
start (instead of dying while trying to see how shallow the
repository currently is) and that can be used to recover the corrupt
repository back into a usable state?  If that is the justification,
I can fully buy that, but that is not what I am hearing in your easy
to explain answer, so I am still puzzled.