Jeff King <peff@xxxxxxxx> writes: > ... So if that tree is fsck'd, and then > checks the blob during fsck_finish(), that should be enough. > Assuming that fsck complains when the pointed-to blob cannot be > accessed, which I think it should (because really, incremental > pushes face the same problem). Yes. > 2. We're not checking fsck connectivity here, and that's intentional. > So you can "hash-object" a tree that points to blobs that we don't > actually have. But if you hash a tree that points a .gitmodules > entry at a blob that doesn't exist, then that will fail the fsck > (during the finish step). And respecting the fsck_finish() exit > code would break that. That's tricky. An attack vector to sneak a bad .gitmodules file into history then becomes (1) hash a tree with a .gitmodules entry that points at a missing blob and then (2) after that fact is forgotten, hash a bad blob pointed to by the tree? We cannot afford to remember all trees with .gitmodules we didn't find the blob for forever, so one approach to solve it is to reject trees with missing blobs. Legitimate use cases should be able to build up trees bottle up to hash blobs before their containing trees. If you hash a commit object, we would want to fsck its tree? Do we want to fsck its parent commit and its tree? Ideally we can stop when our "traversal" reaches objects that are known to be good, but how do we decide which objects are "known to be good"? Being reachable from our refs, as usual? > So I dunno. The code above is doing (2), albeit with the inefficiency of > checking blobs that we might not care about. I kind of think (1) is the > right thing, though, and anybody who really wants to make trees that > point to bogus .gitmodules can use --literally. True.