Re: "bad revision" fetch error when fetching missing objects from partial clones

Jeff King <peff@xxxxxxxx> · Tue, 18 May 2021 04:11:41 -0400

On Mon, May 17, 2021 at 12:25:02PM -0400, Derrick Stolee wrote:

> > Here's the same scenario failing without using bitmaps at all:
> > 
> ...
> >   # ...and then we fetch both the object we need _and_ that second
> >   # commit. That causes pack-objects to traverse from base..more.
> >   # The boundary is at "base", so we mark its tree and blob as
> >   # UNINTERESTING, and thus we _don't_ send them.
> >   cd clone
> >   git fetch origin $(git rev-parse HEAD:file) HEAD
> 
> This is the critical reason why this isn't failing in practice: it
> is very rare to ask for a commit and a blob at the same time. Usually,
> a blob request comes when something like 'git checkout' is actually
> navigating to a commit, and then it only asks for missing blobs.

Yes, I think that may be part of it. But more fundamentally, the request
done by "git checkout" does not send any "have" lines at all, so it
could never trigger this bug, even if it did try to ask for a commit.

> > So I guess the first question is: is this supposed to work? Without
> > bitmaps, it often will. Because we walk commits first, and then only
> > mark trees uninteresting at the boundary; so if there were more commits
> > here, and we were asking to get a blob from one of the middle ones, it
> > would probably work. But fundamentally the client is lying to the server
> > here (as all partial clones must); it is saying "I have that first
> > commit", but of course we don't have all of the reachable objects.
> 
> It _should_ work. We should be specifying the blob:none filter, so when
> we say "we have these commits" it should apply that filter to those
> commits for the "haves".

I guess my "should it work" was more about: are filter requests that
feed arbitrary combinations to git-fetch, including "haves", something
we want to support?

I think the world is a better place if we do. But what I'm wondering is
if there was some intentional choice to avoid triggering this
(especially in the way that the on-demand fetcher uses the "noop"
negotiation algorithm).

And yes, we do specify the blob:none filter. BTW, I saw an interesting
side-behavior here.  If the server _stops_ supporting filters and then
we try to fetch with it, we will happily say "oh well, the other side
doesn't support filters, so I won't bother sending my filter spec". And
then all hell breaks loose, as the other side has no clue that we are a
partial clone.

In practice I think it's an unlikely failure mode for a server you
partial-cloned from to turn off filters, so it's probably not that
important. I hit it because a test script used test_config to enable
them, and then the follow-on test I added to run git-fetch got quite
confused. A more likely scenario is that you might see it a
misconfigured load-balanced pool of servers.

I do wonder how hitting a third-party server should work, though. E.g.,
I partial clone from A, and then ask B to fetch some related history
built on top. Do I tell B that I'm a partial clone and might be missing
some objects? Or do I behave as normal, and expect to fault in objects
that it assumes I have (e.g., a delta base)? And if the latter, does
that work (if it does, then why doesn't the same logic kick in for this
fetch?).

Anyway, that's maybe orthogonal to the bug at hand (and my questions
above are all sincere; it might well work just fine, but I haven't dug
into it further).

> > If this is supposed to work, I think we need to teach the traversal code
> > to "add back" all of the objects that were explicitly given when a
> > filter is in use (either explicitly, or perhaps just clearing or
> > avoiding the UNINTERESTING flag on user-given objects in the first
> > place). And my earlier patch does that for the bitmap side, but not the
> > regular traversal.
> 
> This gets more complicated if we were in a treeless clone, for example.
> We could explicitly ask for a tree and really need all of its reachable
> trees and blobs. It's not enough to just isolate that single object.

Good point. That really requires walking any objects that were listed to
"add back", but _only_ the ones that would have been filtered. In the
treeless clone case, that's easier. But if I say "don't give me any
trees deeper than X", how do I even know which ones those are?

> I wouldn't spend too much time optimizing for the treeless clone case,
> as I believe the client will self-recover by asking for those reachable
> trees when it walks to them.

Yes, I think as long as the on-demand fetch kicks in, then it becomes an
optimization problem, not a correctness one. So perhaps the first fix
should focus on that, even for the blob case being discussed. Then it
_works_, just with an extra round-trip for the on-demand fetch.

-Peff