Re: [PATCH v2 3/4] Add 'promisor-remote' capability to protocol v2

Patrick Steinhardt <ps@xxxxxx> · Mon, 30 Sep 2024 09:56:59 +0200

On Tue, Sep 10, 2024 at 06:29:59PM +0200, Christian Couder wrote:
> diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> index 98c5cb2ec2..9cbfe3e59e 100644
> --- a/Documentation/config/promisor.txt
> +++ b/Documentation/config/promisor.txt
> @@ -1,3 +1,20 @@
>  promisor.quiet::
>  	If set to "true" assume `--quiet` when fetching additional
>  	objects for a partial clone.
> +
> +promisor.advertise::
> +	If set to "true", a server will use the "promisor-remote"
> +	capability, see linkgit:gitprotocol-v2[5], to advertise the
> +	promisor remotes it is using, if it uses some. Default is
> +	"false", which means the "promisor-remote" capability is not
> +	advertised.
> +
> +promisor.acceptFromServer::
> +	If set to "all", a client will accept all the promisor remotes
> +	a server might advertise using the "promisor-remote"
> +	capability. Default is "none", which means no promisor remote
> +	advertised by a server will be accepted. By accepting a
> +	promisor remote, the client agrees that the server might omit
> +	objects that are lazily fetchable from this promisor remote
> +	from its responses to "fetch" and "clone" requests from the
> +	client. See linkgit:gitprotocol-v2[5].

I wonder a bit about whether making this an option is all that sensible,
because that would of course apply globally to every server that you
might want to clone from. Wouldn't it be more sensible to make this
configurabe per server?

Another question: servers may advertise bogus addresses to us, and as
far as I can see there are currently no precautions in place against
malicious cases. The server might for example use this to redirect us to
a remote that uses no encryption, the Git protocol or even the "file://"
protocol. I guess the sane thing here would be to default to allow
clones via "https://"; only, but make the set of accepted protocols
configurable.

> diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> index 414bc625d5..65d5256baf 100644
> --- a/Documentation/gitprotocol-v2.txt
> +++ b/Documentation/gitprotocol-v2.txt
> @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> +promisor-remote=<pr-infos>
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The server may advertise some promisor remotes it is using or knows
> +about to a client which may want to use them as its promisor remotes,
> +instead of this repository. In this case <pr-infos> should be of the
> +form:
> +
> +	pr-infos = pr-info | pr-infos ";" pr-info

Wouldn't it be preferable to make this multiple lines so that we cannot
ever burst through the pktline limits?

> +	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> +
> +where `pr-name` is the urlencoded name of a promisor remote, and
> +`pr-url` the urlencoded URL of that promisor remote.
> +In this case, if the client decides to use one or more promisor
> +remotes the server advertised, it can reply with
> +"promisor-remote=<pr-names>" where <pr-names> should be of the form:

One of the things that LFS provides is custom transfer types. It is for
example possible to use NFS or some other arbitrary protocol to fetch or
upload data. It should be possible to provide similar functionality on
the Git side via custom transport helpers, too, and if we make the
accepted set of helpers configurable as proposed further up this could
be made safe, too.

But one thing I'm missing here is any documentation around how the
client would know which promisor-remote to pick when the remote
advertises multiple of them. The easiest schema would of course be to
pick the first one whose transport helper the client understands and
considers to be safe. But given that we're talking about offloading of
large blobs, would we have usecases for advertising e.g. region-scoped
remotes that require more information on the client-side?

Also, are the promisor remotes promising to each contain all objects? Or
would the client have to ask each promisor remote until it finds a
desired object?

> +	pr-names = pr-name | pr-names ";" pr-name
> +
> +where `pr-name` is the urlencoded name of a promisor remote the server
> +advertised and the client accepts.
> +
> +Note that, everywhere in this document, `pr-name` MUST be a valid
> +remote name, and the ';' and ',' characters MUST be encoded if they
> +appear in `pr-name` or `pr-url`.

So I assume the intent here is to let the client add that promisor
remote with that exact, server-provided name? That makes me wonder about
two different scenarios:

  - We must keep the remote from announcing "origin".

  - What if we eventually decide to allow users to provide their own
    names for remotes during git-clone(1)?

Overall, I don't think that it's a good idea to let the remote dictate
which name a client's remotes have.

> +If the server doesn't know any promisor remote that could be good for
> +a client to use, or prefers a client not to use any promisor remote it
> +uses or knows about, it shouldn't advertise the "promisor-remote"
> +capability at all.
> +
> +In this case, or if the client doesn't want to use any promisor remote
> +the server advertised, the client shouldn't advertise the
> +"promisor-remote" capability at all in its reply.
> +
> +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> +options can be used on the server and client side respectively to
> +control what they advertise or accept respectively. See the
> +documentation of these configuration options for more information.
> +
> +Note that in the future it would be nice if the "promisor-remote"
> +protocol capability could be used by the server, when responding to
> +`git fetch` or `git clone`, to advertise better-connected remotes that
> +the client can use as promisor remotes, instead of this repository, so
> +that the client can lazily fetch objects from these other
> +better-connected remotes. This would require the server to omit in its
> +response the objects available on the better-connected remotes that
> +the client has accepted. This hasn't been implemented yet though. So
> +for now this "promisor-remote" capability is useful only when the
> +server advertises some promisor remotes it already uses to borrow
> +objects from.

In the cover letter you mention that the server may not even have some
objects at all in the future. I wonder how that is supposed to interact
with clients that do not know about the "promisor-remote" capability at
all though.