Re: [PATCH] revision: ignore non-existent objects in resolve-undo list

Jeff King <peff@xxxxxxxx> · Tue, 18 Oct 2022 16:29:23 -0400

On Tue, Oct 18, 2022 at 09:40:01AM -0700, Junio C Hamano wrote:

> And the patch goes in the right direction.  It is a bit sad that it
> now has to do parse_object() but in the normal case, the object
> referenced should be a blob that exists, for which the cost of
> parsing it would be none (just setting .parsed member to true), so
> it should be OK.

This isn't quite true. parse_object() will still inflate the object
contents to check the sha1. I think has_object_file() is probably the
right thing here. We want to know if the object is missing entirely.

We'd not notice corrupted bytes, of course, but that is OK. Traversal
does not open blobs we reach via trees, either. For pack-objects, we
rely on either:

  - for repacking to disk, we check the pack crc for already-packed
    objects (which avoids inflating them). For loose objects, we'll
    inflate them later when we convert them to packed form.

  - for packing to stdout for fetch/push, the receiver is expected to
    check the sha1 via index-pack, etc.

So I think just checking "do we have it? If not, gently skip it" is the
right thing here. And in the long run we'd hopefully remove that code,
as "we don't have it" becomes less "this was probably gc'd with an older
version of git" to "oops, there is a bug in Git that lost this object".

I notice that 5a5ea141e7 (revision: mark blobs needed for resolve-undo
as reachable, 2022-06-09) uses parse_object() in the fsck code path.
That _might_ be better as lookup_object(), as earlier stages of fsck
would have checked the bytes of each object and created an in-memory
object struct. Though I guess in that sense, it doesn't matter;
parse_object() will hit lookup_object() first and see that in-memory
struct.

-Peff