Re: [PATCH] reopen_tempfile(): truncate opened file

Jeff King <peff@xxxxxxxx> · Wed, 5 Sep 2018 11:35:52 -0400

On Wed, Sep 05, 2018 at 05:27:11PM +0200, Duy Nguyen wrote:

> > +test_expect_success PERL 'commit -p with shrinking cache-tree' '
> > +       mkdir -p deep/subdir &&
> > +       echo content >deep/subdir/file &&
> > +       git add deep &&
> > +       git commit -m add &&
> > +       git rm -r deep &&
> 
> OK so I guess at this step, we invalidate some cache-tree blocks, but
> we write the same blocks down (with "invalid" flag), so pretty much
> the same size as before.

I didn't verify exactly what was in the index, but that was my
understanding, too (well, it's a little smaller because we drop the
actual index entries, but keep the invalidated cache-tree). I worry a
little that "rm" might eventually learn to drop those invalidated bits.
But hopefully finding this commit would lead that person to figure out
another way to accomplish the same thing, or to decide that carrying the
test forward isn't worth it.

> > +       after=$(wc -c <.git/index) &&
> > +
> > +       # double check that the index shrank
> > +       test $before -gt $after &&
> > +
> > +       # and that our index was not corrupted
> > +       git fsck
> 
> If the index is not shrunk, we parse remaining rubbish as extensions.
> If by chance the rubbish extension name is in uppercase, then we
> ignore (and not flag it as error). But then the chances of the next 4
> bytes being the "right" extension size is so small that we would end
> up flagging it as bad extension anyway. So it's good. But if you want
> to be even stricter (not necessary in my opinion), make sure that
> stderr is empty.

In this case, the size difference is only a few bytes, so the rubbish
actually ends up in the trailing sha1. The reason I use git-fsck here is
that it actually verifies the whole sha1 (since normal index reads no
longer do). In fact, a normal index read won't show any problem for this
case (since it is _only_ the trailing sha1 which is junk, and we no
longer verify it on every read).

In the original sparse-dev case, the size of the rubbish is much larger
(because we deleted a lot more entries), and we do interpret it as a
bogus extension. But it also triggers here, because the trailing sha1 is
_also_ wrong.

So AFAIK this fsck catches everything and yields a non-zero exit in the
error case. And it should work for even a single byte of rubbish.

-Peff