Re: [PATCH 3/4] Add 'promisor-remote' capability to protocol v2

Christian Couder <christian.couder@xxxxxxxxx> · Tue, 20 Aug 2024 13:32:48 +0200

On Wed, Jul 31, 2024 at 5:40 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Wed, Jul 31, 2024 at 03:40:13PM +0200, Christian Couder wrote:
> > By default, or if "promisor.advertise" is set to 'false', a server S will
> > advertise only the "promisor-remote" capability without passing any
> > argument through this capability. This means that S supports the new
> > capability but doesn't wish any client C to directly access any promisor
> > remote X S might use.
>
> Even if the server supports this new capability, is there a reason to
> advertise it to the client if the server knows ahead of time that it has
> no promisor remotes to advertise?

I think it could be useful at least in some cases for C to know that S
has the capability to advertise promisor remotes but decided not to
advertise any. For example, if C knows that the repo has a lot of very
large files, it might realize that S is likely not a good mirror of
the repo if it doesn't have the 'promisor-remote' capability.

I agree that it's more useful the other way though. That is for a
server to know that the client has the capability but might not want
to use it.

For example, when C clones without using X directly, it can be a
burden for S to have to fetch large objects from X (as it would use
precious disk space on S, and unnecessarily duplicate large objects).
So S might want to say "please use a newer or different client that
has the 'promisor-remote' capability" if it knows that the client
doesn't have this capability. If S knows that C has the capability but
didn't configure it or doesn't want to use it, it could instead say
something like "please consider activating the 'promisor-remote'
capability by doing this and that to avoid burdening this server and
get a faster clone".

Note that the client might not be 'git'. It might be a "compatible"
implementation (libgit2, gix, JGit, etc), so using the version passed
in the "agent" protocol capability is not a good way to detect if the
client has the capability or not.

In the end, as it looks very useful for S to know if C has the
capability or not, and as it seems natural that S and C behave the
same regarding advertising the capability, I think the choice of
always advertising the capability, even when not using it, is the
right one.

> I am not sure what action the client would take if it knows the server
> supports this capability, but does not actually have any promisor
> remotes to advertise. I would suggest that setting promisor.advertise to
> false indeed prevents advertising it as a capability in the first place.

It could, in some cases, help C realize that S is likely using old or
unoptimized server software for the repo, and C could decide based on
this to use a different mirror repo. For example if C wants to clone
some well known open source AI repo that has a lot of very large files
and is mirrored on many common repo hosting platforms (GitHub, GitLab,
etc), C might be happy to get a clue of how likely the different
mirrors are to be optimized to serve that repo.

I agree that it might not be a very good reason right now, but I think
it might be in the future. Anyway the main reason for such a behavior
is (as I said above) that it is very useful for S to know if C has the
'promisor-remote' capability or not.

> Selfishly, it prevents some issues that I have when rolling out new Git
> versions within GitHub's infrastructure, since our push proxy layer
> picks a single replica to replay the capabilities from, but obviously
> replays the client's response to all replicas. So if only some replicas
> understand the new 'promisor-remote' capability, we can run into issues.

I understand the problem, but I think it might be worked around by
first deploying on a single replica with the new 'promisor-remote'
capability disabled in the config, which is the default. Yeah, that
replica might behave a bit differently than the others, but the client
behavior shouldn't change much. And when things work well with a
single replica with the capability disabled, then more replicas with
that capability disabled can be rolled out until they all have it.

More issues are likely to happen when actually enabling the
capability, but this is independent of the fact that the
'promisor-remote' capability is advertised even if it is not enabled.

> I'm not sure if the client even bothers to send back promisor-remote if
> the server did not send any such remotes to begin with,

If S sends 'promisor-remote' even without sending any remote
information then C should reply using 'promisor-remote' too. I think
it can help S decide if setting up promisor remotes is worth it or
not, if S can easily know if many of its clients could use them or
not.

> but between that
> and what I wrote in the second paragraph here, I don't see a reason to
> advertise the capability when promisor.advertise is false.