Re: [PATCH v2 3/3] upload-pack: allow configuring a missing-action

Christian Couder <christian.couder@xxxxxxxxx> · Tue, 28 May 2024 12:10:31 +0200

On Fri, May 24, 2024 at 11:51 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> >> Repository S borrows from its "promisor" X, and repository C which
> >> initially cloned from S borrows from its "promisor" S.  Even if C
> >> wants an object in order to fill in the gap in its object graph, S
> >> may not have it (and S itself may have no need for that object), and
> >> in such a case, bypassing S and let C go directly to X would make
> >> sense.
> > ...
> >>
> >> It feels utterly irresponsible to give an option to set up a server
> >> that essentially declares: I'll serve objects you ask me as best
> >> efforts basis, the pack stream I'll give you may not have all
> >> objects you asked for and missing some objects, and when I do so, I
> >> am not telling you which objects I omitted.
> >
> > I don't think it's irresponsible. The client anyways checks that it
> > got something usable in the same way as it does when it performs a
> > partial fetch or clone. The fetch or clone fails if that's not the
> > case. For example if the checkout part of a clone needs some objects
> > but cannot get them, the whole clone fails.
>
> But then what can the repository C do after seeing such a failure?

It's basically the same as when a regular clone or a partial clone or
a clone using bundle-uri fails or when using a regular bundle fails.
If it failed because the remote was not properly configured, then that
config can be fixed. If it fails because the remote doesn't have some
objects, then maybe the missing objects can be transferred to the
remote. And so on.

The feature doesn't create any new kind of failure. In particular,
when you use a partial clone, even a very simple one with a single
remote, there is always the risk of not being able to get some missing
objects as there is the risk of the remote being unreachable for some
reason (like if you take a plane and don't have an internet
connection, or if there is an outage on the server side). There are
some added risks because the feature requires added configuration and
it can be wrong like any configuration, and because there are 2
remotes instead of just one. But these are not new kinds of risks.
These risks already exist if one uses multiple promisor remotes.

> With the design, S does not even consult C to see if C knows about
> X.

If S is managed by a company like GitLab or GitHub, then S will
certainly advertise, for example by showing a command that can easily
be copy-pasted from the web page of the project onto the user's
command line, some way for C to use X.

In the cover letter I give the example of the following command that
can be used (and advertised by S):

  GIT_NO_LAZY_FETCH=0 git clone
      -c remote.my_promisor.promisor=true \
      -c remote.my_promisor.fetch="+refs/heads/*:refs/remotes/my_promisor/*" \
      -c remote.my_promisor.url=<MY_PROMISOR_URL> \
      --filter="blob:limit=5k" server

I also agree in the cover letter that this is not the most user
friendly clone command and I suggest that I could work on improving on
that by saying:

"it would be nice if there was a capability for the client to say
that it would like the server to give it information about the
promisor that it could use, so that the user doesn't have to pass all
the "remote.my_promisor.XXX" config options on the command like."

and by saying that this could be added later.

If you think that such a capability should definitely be part of this
work, for example because it wouldn't be sane to require users to use
such a long and complex command and it could avoid difficult to debug
failures, then I would be willing to work on this and add it to this
series.

> Without knowing that, it cannot safely decide that it does not
> have to send objects that can be obtained from X to C.

In the above command C is asking for a partial clone, as it uses a
--filter option. This means that C knows very well that it might not
get from S all the objects needed for a complete object graph. So why
can't S safely decide not to send some objects to C? Why would it be
Ok if C wanted a partial clone but didn't want to get some objects
from X at the same time, but would not be Ok if C wants the same
partial clone but also with the possibility to get some of the objects
from X right away? To me it seems less risky to ask for some objects
from X right away.

>  Instead, S
> simply say "if C requests an object that I do not have, just ignore
> it and let C grab it from somewhere else".  How would it not be an
> irresponsible design?

Again when using a regular partial clone omitting the same set of
objects, C also requests some objects that S doesn't have. And this is
not considered an issue or something irresponsible. It already works
like this. And then C still has the possibility to configure X as a
promisor remote and get missing objects from there. So why is it Ok
when it's done in several steps but not in one?