Re: [PATCH 6/6] hash-object: use fsck for object checks

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 01 Feb 2023 12:41:18 -0800

Jeff King <peff@xxxxxxxx> writes:

>      ... So if that tree is fsck'd, and then
>      checks the blob during fsck_finish(), that should be enough.
>      Assuming that fsck complains when the pointed-to blob cannot be
>      accessed, which I think it should (because really, incremental
>      pushes face the same problem).

Yes.

>   2. We're not checking fsck connectivity here, and that's intentional.
>      So you can "hash-object" a tree that points to blobs that we don't
>      actually have. But if you hash a tree that points a .gitmodules
>      entry at a blob that doesn't exist, then that will fail the fsck
>      (during the finish step). And respecting the fsck_finish() exit
>      code would break that.

That's tricky.  An attack vector to sneak a bad .gitmodules file
into history then becomes (1) hash a tree with a .gitmodules entry
that points at a missing blob and then (2) after that fact is
forgotten, hash a bad blob pointed to by the tree?  We cannot afford
to remember all trees with .gitmodules we didn't find the blob for
forever, so one approach to solve it is to reject trees with missing
blobs.  Legitimate use cases should be able to build up trees bottle
up to hash blobs before their containing trees.

If you hash a commit object, we would want to fsck its tree?  Do we
want to fsck its parent commit and its tree?  Ideally we can stop
when our "traversal" reaches objects that are known to be good, but
how do we decide which objects are "known to be good"?  Being
reachable from our refs, as usual?

> So I dunno. The code above is doing (2), albeit with the inefficiency of
> checking blobs that we might not care about. I kind of think (1) is the
> right thing, though, and anybody who really wants to make trees that
> point to bogus .gitmodules can use --literally.

True.