Re: [PATCH 4/4] commit: don't lazy-fetch commits

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 1 Dec 2022 11:11:50 -0800

Jeff King <peff@xxxxxxxx> writes:
> OK, so we know we want a commit object because we're in the
> commit-parsing function, so we just ask to disable fetching.
> 
> Two devil's advocate thoughts:
> 
>   1. What if we're wrong that it's a commit? If somebody references a
>      blob in a commit "parent" header, the "right" outcome is for us to
>      say "oops, the type is wrong" when we try to parse it as a commit.
>      But now in a partial clone, we might avoid fetching that supposed
>      commit and say "you don't have this object", even though we could
>      get it.
> 
>      I think I'm OK with that. Either way, the repo is corrupt, and
>      we'll have informed the user of that. The fact that this bizarre
>      and specific sequence of corruptions might not go as far as it can
>      to deduce the root cause is probably fine.
> 
>   2. Are there other places where we'd want to do the same thing? E.g.,
>      in parse_object() we might ask for an object (not knowing its type)
>      only to find out that it is a commit. But we have no idea if we
>      lazy-fetched it or not!
> 
>      I had somehow imagined your series would be hooking in at the level
>      of the lazy-fetch code, and complaining about fetching commits. But
>      that may be tricky to do, because we really don't know the type
>      until after we fetch it, and selectively removing an object from
>      the odb is quite hard. We'd probably have to tell the other side
>      "please, don't send me any commits", which requires a protocol
>      extension, and...yuck.
> 
>      By comparison, your approach is an easy win that may catch problems
>      in practice (and is certainly better than the status quo).
> 
> -Peff

Thanks for taking a look. Let me know if you think that the commit message
could be improved to cover these cases. Right now I think that e.g. "When
parsing an object believed to be a commit in repo_parse_commit_internal()"
instead of "When parsing commits" wouldn't add much value, but I might be
missing something.