Re: [PATCH v2 00/10] improve reporting of unexpected objects

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 29 Mar 2021 15:34:03 +0200

On Sun, Mar 28 2021, Jeff King wrote:

> On Sun, Mar 28, 2021 at 04:13:30AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Ævar Arnfjörð Bjarmason (10):
>>   object.c: stop supporting len == -1 in type_from_string_gently()
>>   object.c: refactor type_from_string_gently()
>>   object.c: make type_from_string() return "enum object_type"
>>   object-file.c: make oid_object_info() return "enum object_type"
>>   object-name.c: make dependency on object_type order more obvious
>>   tree.c: fix misindentation in parse_tree_gently()
>>   object.c: add a utility function for "expected type X, got Y"
>>   object.c: add and use oid_is_type_or_die_msg() function
>>   object tests: add test for unexpected objects in tags
>>   tag: don't misreport type of tagged objects in errors
>
> I'm somewhat skeptical of the final patch, given my comments (just now)
> in:
>
>   https://lore.kernel.org/git/YGBHH7sAVsPpVKWd@xxxxxxxxxxxxxxxxxxxxxxx/
>
> I'll quote them here:

Picking up where we left off in
http://lore.kernel.org/git/8735wfnv7i.fsf@xxxxxxxxxxxxxxxxxxx ...

>> Because when we call, say, lookup_blob() and find that the object is
>> already in memory as a non-blob, we don't know who the culprit is.
>> Perhaps an earlier part of the code called parse_object(), found that it
>> really is a blob on disk, and used that type. But it may equally have
>> been the case that we saw a reference to the object as a commit, called
>> lookup_commit() on it, and now our lookup_blob() call is unhappy,
>> thinking it is really a commit. In that case, one of those references is
>> wrong, but we don't know which.
>>
>> I think a robust solution would be one of:
>>
>>   - make the message more precise: "saw object X as a commit, but
>>     previously it was referred to as a blob". Or vice versa.
>>
>>   - when we see such a mismatch, go to the object database to say "aha,
>>     on disk it is really a blob". That's expensive, but this is an error
>>     case, so we can afford to be slow. But it does produce unsatisfying
>>     results when it was the earlier lookup_commit() call that was wrong.
>>     Because we have to say "object X is really a blob, but some object
>>     earlier referred to it as a commit. No idea who did that, though!".
>
> Looking at the final patch, I think you side-step the issue to some
> degree because it is only touching the parse_object() code paths, where
> we really have looked at the bytes in the object database. So it
> basically is doing the second thing above (which is "free" because we
> were accessing the odb anyway).
>
> But I think it still has the "oops, somebody made a wrong reference much
> earlier" problem. The actual bug is in some other object entirely, whose
> identity is long forgotten. I think we would be much better off to say
> something like "somebody expected X to be a commit, but now somebody
> else expects it to be a blob", which is all that we can reliably say.
> And the next step really ought to be running "git fsck" to figure out
> what is going on (and we should perhaps even say so via advise()).

Yes I'm totally side-stepping the issue, but I don't see a way around
that that doesn't make the whole object lookup code either much slower,
or more complex.

I.e. the whole thing is an emergent effect of us seeing a tag object,
and noting in-memory that we saw a given OID of type X, but we don't
even know if we can look it up at that point, or that it's not type Y.

I don't think it's guaranteed that we're neatly in one single object
traversal at that point (e.g. if we're looking at N tags, and only later
dereferencing their "object" pointers). So passing a "object A which I
have now says B is a X, assert!" wouldn't work.

We could eagerly get the object from disk when parsing tags (slow?), or
have a void* in the object struct or whatever to say "this is the OID
that claimed you were XYZ" (ew!).

Or, which I think makes sense here, just don't worry about it and error
with the limited info we have at hand. Yes we can't report who the
ultimate culprit is without an fsck, but that's not different than a lot
of other error() and die() messages in the object code now.

So if we're going to emit an advise() that seems generally useful for
many of those error()/die() messages, and not something we should tack
onto this incremental improvement to one error.