Re: [PATCH 3/4] Add 'promisor-remote' capability to protocol v2

Christian Couder <christian.couder@xxxxxxxxx> · Tue, 20 Aug 2024 14:21:28 +0200

On Wed, Jul 31, 2024 at 8:25 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > When a server repository S borrows some objects from a promisor remote X,
> > then a client repository C which would like to clone or fetch from S might,
> > or might not, want to also borrow objects from X. Also S might, or might
> > not, want to advertise X as a good way for C to directly get objects from,
> > instead of C getting everything through S.
>
> If S is a clone that is keeping up to date with X, even if it does
> not borrow anything from X, as long as X is known to be much better
> connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY
> datacenter with petabit/s backbone connections) than S is (e.g., it
> is my deskside box on a cable modem), it may be beneficial if S can
> omit objects from its "git fetch" response to C, if C is willing to
> fill the gap using X.
>
> So it is of dubious value to limit the feature only to cases where S
> "borrows" from X, is it?

I agree. I just blindly copied the word from something you said in a
previous thread, but I should have thought more about the use case you
suggest.

> > To allow S and C to agree on C using X or not, let's introduce a new
> > "promisor-remote" capability in the protocol v2, as well as a few new
> > configuration variables:
> >
> >   - "promisor.advertise" on the server side, and:
> >   - "promisor.acceptFromServer" on the client side.
> >
> > By default, or if "promisor.advertise" is set to 'false', a server S will
> > advertise only the "promisor-remote" capability without passing any
> > argument through this capability. This means that S supports the new
> > capability but doesn't wish any client C to directly access any promisor
> > remote X S might use.
>
> I would find it more natural if .advertise is turned off by setting
> it explicitly to "false", we would pretend as if we have never even
> heard of such a capability.

In a reply to Taylor I just sent, I tried to explain why I think it's
a good thing that S can know if C has the capability even if neither S
nor C actually use it.

If, in the future, many servers and repos are transitioned to setups
where promisor remotes and this capability are used, then I think it
could help a lot if S can help C take advantage of X, and better
diagnose issues. And perhaps the other way around (C knowing that S
has the capability or not) could help a bit too.

> > If "promisor.advertise" is set to 'true', S will advertise its promisor
> > remotes with a string like:
> >
> >   promisor-remote=<pm-info>[;<pm-info>]...
> >
> > where each <pm-info> element contains information about a single
> > promisor remote in the form:
> >
> >   name=<pm-name>[,url=<pm-url>]
> > where <pm-name> is the name of a promisor remote and <pm-url> is the
> > urlencoded url of the promisor remote named <pm-name>.
>
> OK, so pm-name cannot have ";," in it (which is sensible, or define
> pm-name more tightly, like "only lowercase alnum").

Not sure what our config allows here. Ideally I think the capability
should support everything our config supports.

>  URL cannot have
> ';' or ',' in it that is an OK limitation as URL encoding can hide
> them.

Yeah, right.

> > For now, the URL is passed in addition to the name. In the future, it
> > might be possible to pass other information like a filter-spec that the
> > client should use when cloning from S, or a token that the client should
> > use when retrieving objects from X.
>
> OK.  And obviously they cannot have ';," in them without encoding
> similarly.

Yeah, or maybe using a different encoding if it's better for some reason.

> > It might also be possible in the future for "promisor.advertise" to have
> > other values like "onlyName", so that no URL is advertised.
>
> Saying "<pm-info> is expected to be extended" should be sufficient,
> without inviting discussions like "what good does it do to give only
> names" that is irrelevant at least at this moment.

Yeah, but in this case Taylor had comments related to advertising
remotes that have a name but no URL, so I think this example could
help people having related questions understand that it could be a
good thing to advertise only a URL name and no URL.

> > By default or if "promisor.acceptFromServer" is set to "None", the
> > client will not accept to use the promisor remotes that might have been
> > advertised by the server. In this case, the client will advertise only
> > "promisor-remote" in its reply to the server. This means that the client
> > has the "promisor-remote" capability but decided not to use any of the
> > promisor remotes that the server might have advertised.
>
> OK, that is a signal to the server side that it is not allowed to
> omit any objects from its response to "git fetch" request, even
> though they might be available via a better connected remotes.

Yeah, right.

> > If "promisor.acceptFromServer" is set to "All", on the contrary, the
> > client will accept to use all the promisor remotes that the server
> > advertised and it will reply with a string like:
> >
> >   promisor-remote=<pm-name>[;<pm-name>]...
> >
> > where the <pm-name> elements are the names of all the promisor remotes
> > the server advertised.
>
> So, this is why we need "name" for each "pm-info"---to give a short
> name associated with the URL of the remote repository.

Yeah, I think it can be valuable to use names to agree on
promisor-remotes that should or shouldn't be used, like we commonly
use remote names with `git clone`, `git fetch`, etc.

> Presumably, C has an option to see if each of the remote suggested
> is reachable and omit remotes that are not available to C from its
> response, so even when .accept is set to "all", the response may not
> list all the names of remotes that S advertised, in general.

I didn't think about reachability or connectivity specifically, but I
agree it might be useful for C to be able to filter in some ways the
promisor remotes that S advertised.

> > If the server advertised no promisor remote
> > though, the client will reply with just "promisor-remote".
>
> In other words, at the protocol level:
>
>  - S uses promisor-remote capability to tell C what are potentially
>    useful alternate remotes to obtain objects that C may want to
>    fetch from S
>
>  - C uses promisor-remote capability to tell S that among the
>    remotes advertised by S, it is willing to use the named remotes
>    as its promisor, which permits S from omitting objects from its
>    response to "git fetch" request from C as long as they are known
>    to be available from these remotes.

Yeah, right.

> I think that makes sense, but I do not see the point of sending an
> empty promisor-remote capability at all.

I hope I gave good enough reasons above and in my replies to Taylor.

> What practical difference would it make to S and C, if S chooses not
> to advertise the capability at all, instead of advertising an empty
> remote list with the capability?  Both tells C that it is useless to
> request promistor-remote capability to S in its response.
>
> What practical difference would it make to S and C, if C chooses not
> to advertise the capability at all, instead of advertising an empty
> remote list with the capability?  Both tells S that S is not allowed
> to omit objects that are obtainable from elsewhere.

I agree that when looking at how Git related things technically work,
it doesn't change anything, but I think we should look at the big
picture too. For GitLab, for example (but I suppose it will be similar
for GitHub and other hosting sites), it will be important to help
users take advantage, as seamlessly as possible, of the feature, and
the more we know about the client they use and its capabilities, the
better job we can do to help them.

> > In a following commit, other values for "promisor.acceptFromServer" will
> > be implemented so that the client will be able to decide the promisor
> > remotes it accepts depending on the name and URL it received from the
> > server. So even if that name and URL information is not used much right
> > now, it will be needed soon.
>
> OK.
>
> > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> > index 98c5cb2ec2..e3939d83a9 100644
> > --- a/Documentation/config/promisor.txt
> > +++ b/Documentation/config/promisor.txt
> > @@ -1,3 +1,16 @@
> >  promisor.quiet::
> >       If set to "true" assume `--quiet` when fetching additional
> >       objects for a partial clone.
> > +
> > +promisor.advertise::
> > +     If set to "true", a server will use the "promisor-remote"
> > +     capability, see linkgit:gitprotocol-v2[5], to advertise the
> > +     promisor remotes it is using if any. Default is "false", which
> > +     means no promisor remote is advertised.
>
> Even though I said that there logically is not much reason to tie
> this advertisement to the use of promistor remote by the serving
> side, I am OK if the initial implementation is limited to that
> arrangement.  It would be an easy change to allow this variable
> to take a list of remote repositories that may (or may not) be a
> promisor remote of this repository (in other words, "they are clones
> that are better connected than me") in the future, but that does not
> have to happen in the initial iteration.

Yeah, I will mention something like this in the next iteration.

> It would be less confusing to first-time readers if you described
> the intent a bit better.  Why would the server want to advertise and
> how would the client take advantage of the information?

Yeah, I will try to better answer that question in the doc, cover
letter and commit messages.

> I see that
> the update in this patch to protocol document is skimpy on this point,
> and end-user facing documentation has better exposure anyway, so
> let's see what we can do here.
>
>     The "promisor-remote" protocol capability can be used by the
>     responder to "git fetch" to advertise better-connected remotes
>     that the requester can use as promisor remotes, instead of this
>     repository, so that "git fetch" requestor can lazily fetch
>     objects from these other better-connected remotes.  If this
>     configuration variable is set to "true",...
>
> or something, perhaps?

Thanks for the suggestion. I will base the changes in version 2 on it.

> "no promisor remote is advertised" -> "no promisor-remote capability
> is advertised".

Right.

> > +promisor.acceptFromServer::
> > +     If set to "all", a client will accept all the promisor remotes
> > +     a server might advertise using the "promisor-remote"
> > +     capability, see linkgit:gitprotocol-v2[5]. Default is "none",
> > +     which means no promisor remote advertised by a server will be
> > +     accepted.
>
> Similarly, readers would want to know what the implication is to
> "accept" promisor remotes.
>
>         accept ..., and adds them to its promisor remotes, allowing
>         the server to omit objects from its response to "fetch"
>         requests that are lazily fetchable from these promisor
>         remotes, see linkgit:gitprotocol-v2[5].
>
> or something?

Thanks for the suggestion. I will improve the wording based on it.

> > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> > index 414bc625d5..4d8d3839c4 100644
> > --- a/Documentation/gitprotocol-v2.txt
> > +++ b/Documentation/gitprotocol-v2.txt
> > @@ -781,6 +781,43 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >
> > +promisor-remote=<pr-infos>
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The server may advertise some promisor remotes it is using, if it's OK
> > +for the server that a client uses them too. In this case <pr-infos>
> > +should be of the form:
>
> As this is the protocol documentation, we should describe what goes
> over the wire and what they mean, regardless of how limited the
> initial implementation on either end is.  Advertising the promisor
> remotes the server side relies on is probably not what we want to
> see this capability limited to forever (remember the previous "X is
> much better connected than S" example).
>
>     "it is using, if it's OK ..." -> "the other side may want to use
>     as its promisot remotes, instead of this repository"

Right, thanks.

> > +     pr-infos = pr-info | pr-infos ";" pr-info
> > +
> > +     pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +
> > +where `pr-name` is the name of a promisor remote, and `pr-url` the
> > +urlencoded URL of that promisor remote.
>
> Clarify what the syntax for a valid name here.  Also stress the fact
> that ';' and ',' MUST be encoded if it appears in 'pr-url'.

Yeah, I will do that.

> > +In this case a client wanting to use one or more promisor remotes the
> > +server advertised should reply with "promisor-remote=<pr-names>" where
> > +<pr-names> should be of the form:
> > +
> > +     pr-names = pr-name | pr-names ";" pr-name
> > +
> > +where `pr-name` is the name of a promisor remote the server
> > +advertised.
>
> After seeing advertisement, client can use some it picked but it
> does not have to tell the server about it.  Why would it respond
> with the promisor remotes, and what effect does it have to give the
> list of promisor remotes it uses?
>
>     If the "git fetch" side decides to use one or more promisor
>     remotes advertised, it can reply with ...
>     ...
>     where ... the server advertised.  Doing so allows the server to
>     make its response smaller by omitting objects that are known to
>     be lazily fetchable from these other promisor remotes
>     repositories.
>
> perhaps?

Yeah, right.

> > +If the server prefers a client not to use any promisor remote the
> > +server uses, or if the server doesn't use any promisor remote, it
> > +should only advertise "promisor-remote" without any value or "=" sign
> > +after it.
>
> It should not advertise "promisor-remote" capability at all.

Let's discuss this above or in a thread started by Taylor's reviews.

> > +In this case, or if the client doesn't want to use any promisor remote
> > +the server advertised, the client should reply only "promisor-remote"
> > +without any value or "=" sign after it.
>
> Likewise.  It should not advertise "promisor-remote" capability at
> all.
>
> > +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> > +options can be used on the server and client side respectively to
> > +control what they advertise or accept respectively. See the
> > +documentation of these configuration options for more information.
>
> OK.

Thanks for the review and suggestions.