On Tue, Oct 18, 2022 at 09:40:01AM -0700, Junio C Hamano wrote: > And the patch goes in the right direction. It is a bit sad that it > now has to do parse_object() but in the normal case, the object > referenced should be a blob that exists, for which the cost of > parsing it would be none (just setting .parsed member to true), so > it should be OK. This isn't quite true. parse_object() will still inflate the object contents to check the sha1. I think has_object_file() is probably the right thing here. We want to know if the object is missing entirely. We'd not notice corrupted bytes, of course, but that is OK. Traversal does not open blobs we reach via trees, either. For pack-objects, we rely on either: - for repacking to disk, we check the pack crc for already-packed objects (which avoids inflating them). For loose objects, we'll inflate them later when we convert them to packed form. - for packing to stdout for fetch/push, the receiver is expected to check the sha1 via index-pack, etc. So I think just checking "do we have it? If not, gently skip it" is the right thing here. And in the long run we'd hopefully remove that code, as "we don't have it" becomes less "this was probably gc'd with an older version of git" to "oops, there is a bug in Git that lost this object". I notice that 5a5ea141e7 (revision: mark blobs needed for resolve-undo as reachable, 2022-06-09) uses parse_object() in the fsck code path. That _might_ be better as lookup_object(), as earlier stages of fsck would have checked the bytes of each object and created an in-memory object struct. Though I guess in that sense, it doesn't matter; parse_object() will hit lookup_object() first and see that in-memory struct. -Peff