Re: [PATCH v2 3/4] Add 'promisor-remote' capability to protocol v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 30, 2024 at 9:57 AM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> On Tue, Sep 10, 2024 at 06:29:59PM +0200, Christian Couder wrote:
> > diff --git a/Documentation/config/promisor.txt b/Documentation/config/promisor.txt
> > index 98c5cb2ec2..9cbfe3e59e 100644
> > --- a/Documentation/config/promisor.txt
> > +++ b/Documentation/config/promisor.txt
> > @@ -1,3 +1,20 @@
> >  promisor.quiet::
> >       If set to "true" assume `--quiet` when fetching additional
> >       objects for a partial clone.
> > +
> > +promisor.advertise::
> > +     If set to "true", a server will use the "promisor-remote"
> > +     capability, see linkgit:gitprotocol-v2[5], to advertise the
> > +     promisor remotes it is using, if it uses some. Default is
> > +     "false", which means the "promisor-remote" capability is not
> > +     advertised.
> > +
> > +promisor.acceptFromServer::
> > +     If set to "all", a client will accept all the promisor remotes
> > +     a server might advertise using the "promisor-remote"
> > +     capability. Default is "none", which means no promisor remote
> > +     advertised by a server will be accepted. By accepting a
> > +     promisor remote, the client agrees that the server might omit
> > +     objects that are lazily fetchable from this promisor remote
> > +     from its responses to "fetch" and "clone" requests from the
> > +     client. See linkgit:gitprotocol-v2[5].
>
> I wonder a bit about whether making this an option is all that sensible,
> because that would of course apply globally to every server that you
> might want to clone from. Wouldn't it be more sensible to make this
> configurabe per server?

It depends. If, for example, you are in a corporate environment where
you will interact only with trusted servers, then it might be easier
to have only one option and configure it once for all the servers you
are going to interact with. I am Ok to also have an option
configurable per server in the future though.

> Another question: servers may advertise bogus addresses to us, and as
> far as I can see there are currently no precautions in place against
> malicious cases.

The commit message says:

"In a following commit, other values for "promisor.acceptFromServer" will
be implemented, so that C will be able to decide the promisor remotes it
accepts depending on the name and URL it received from S."

and indeed as you noticed in your review of patch 4/4, this concern is
addressed by patch 4/4.

> The server might for example use this to redirect us to
> a remote that uses no encryption, the Git protocol or even the "file://"
> protocol. I guess the sane thing here would be to default to allow
> clones via "https://"; only, but make the set of accepted protocols
> configurable.

Yeah, it is another potential config option that could be added. At
this stage I don't want to send a lot of patches with a large number
of possibly useful configuration options as it might appear later that
very few are actually used and useful.

> > diff --git a/Documentation/gitprotocol-v2.txt b/Documentation/gitprotocol-v2.txt
> > index 414bc625d5..65d5256baf 100644
> > --- a/Documentation/gitprotocol-v2.txt
> > +++ b/Documentation/gitprotocol-v2.txt
> > @@ -781,6 +781,60 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >
> > +promisor-remote=<pr-infos>
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The server may advertise some promisor remotes it is using or knows
> > +about to a client which may want to use them as its promisor remotes,
> > +instead of this repository. In this case <pr-infos> should be of the
> > +form:
> > +
> > +     pr-infos = pr-info | pr-infos ";" pr-info
>
> Wouldn't it be preferable to make this multiple lines so that we cannot
> ever burst through the pktline limits?

LARGE_PACKET_MAX is 65520 which looks more than enough to me. So
having the pktline limit could actually prevent malicious servers from
sending too much junk.

I wouldn't be against multiple lines if there were reasonable cases
where the current pktline limit might not be enough (or if such cases
appeared in the future) though. It's just that I can't think of any
such reasonable case.

> > +     pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +
> > +where `pr-name` is the urlencoded name of a promisor remote, and
> > +`pr-url` the urlencoded URL of that promisor remote.
> > +In this case, if the client decides to use one or more promisor
> > +remotes the server advertised, it can reply with
> > +"promisor-remote=<pr-names>" where <pr-names> should be of the form:
>
> One of the things that LFS provides is custom transfer types. It is for
> example possible to use NFS or some other arbitrary protocol to fetch or
> upload data. It should be possible to provide similar functionality on
> the Git side via custom transport helpers, too, and if we make the
> accepted set of helpers configurable as proposed further up this could
> be made safe, too.

It's already possible to use remote helpers using different protocols
with promisor remotes and the URL security issues are addressed by
patch 4/4.

> But one thing I'm missing here is any documentation around how the
> client would know which promisor-remote to pick when the remote
> advertises multiple of them.

In most cases for now, I think the server should advertise only one,
and the client should configure that promisor remote on its own and
set "promisor.acceptFromServer" to "KnownUrl", or maybe "KnownName" in
a corporate setting, (see patch 4/4).

If a server advertises more than one, it should have some docs to
explain why it does that and which one(s) should be picked by which
client. For example, it could say something like "Users in this part
of the world might want to pick only promisor remote A as it is likely
to be better connected to them, while users in other parts of the
world should pick only promisor remote B for the same reason."

> The easiest schema would of course be to
> pick the first one whose transport helper the client understands and
> considers to be safe. But given that we're talking about offloading of
> large blobs, would we have usecases for advertising e.g. region-scoped
> remotes that require more information on the client-side?

If region-scoped means something like the example I talk about above,
then yeah, as also discussed with Junio, this could be an interesting
use case.

> Also, are the promisor remotes promising to each contain all objects? Or
> would the client have to ask each promisor remote until it finds a
> desired object?

I think both use cases could be interesting.

> > +     pr-names = pr-name | pr-names ";" pr-name
> > +
> > +where `pr-name` is the urlencoded name of a promisor remote the server
> > +advertised and the client accepts.
> > +
> > +Note that, everywhere in this document, `pr-name` MUST be a valid
> > +remote name, and the ';' and ',' characters MUST be encoded if they
> > +appear in `pr-name` or `pr-url`.
>
> So I assume the intent here is to let the client add that promisor
> remote with that exact, server-provided name? That makes me wonder about
> two different scenarios:
>
>   - We must keep the remote from announcing "origin".

I agree that it might not be a good idea to have something else than
the main remote named origin. I am not sure it's necessary to
explicitly disallow it though.

>   - What if we eventually decide to allow users to provide their own
>     names for remotes during git-clone(1)?

I think it could be confusing, so I would say that we should wait
until a concrete case where it could be useful appear before allowing
this.

> Overall, I don't think that it's a good idea to let the remote dictate
> which name a client's remotes have.

Maybe a new mode like "KnownURL" but where only the URL and not the
name should match could be interesting in some cases then? If that's
the case it's very simple to add it. I just prefer not to do it for
now as I am not yet convinced there is a very relevant use case. I
think that if a client doesn't want to trust and cooperate with the
server at all, it might just be better in most cases for it to just
leave the server alone and not access it at all, independently of
using promisor remote or not.

> > +If the server doesn't know any promisor remote that could be good for
> > +a client to use, or prefers a client not to use any promisor remote it
> > +uses or knows about, it shouldn't advertise the "promisor-remote"
> > +capability at all.
> > +
> > +In this case, or if the client doesn't want to use any promisor remote
> > +the server advertised, the client shouldn't advertise the
> > +"promisor-remote" capability at all in its reply.
> > +
> > +The "promisor.advertise" and "promisor.acceptFromServer" configuration
> > +options can be used on the server and client side respectively to
> > +control what they advertise or accept respectively. See the
> > +documentation of these configuration options for more information.
> > +
> > +Note that in the future it would be nice if the "promisor-remote"
> > +protocol capability could be used by the server, when responding to
> > +`git fetch` or `git clone`, to advertise better-connected remotes that
> > +the client can use as promisor remotes, instead of this repository, so
> > +that the client can lazily fetch objects from these other
> > +better-connected remotes. This would require the server to omit in its
> > +response the objects available on the better-connected remotes that
> > +the client has accepted. This hasn't been implemented yet though. So
> > +for now this "promisor-remote" capability is useful only when the
> > +server advertises some promisor remotes it already uses to borrow
> > +objects from.
>
> In the cover letter you mention that the server may not even have some
> objects at all in the future.

I am not sure which part of the cover letter this refers to. If S uses
X as a promisor remote, then yeah, it might not have some objects that
are on X. But perhaps there is some wrong wording or a
misunderstanding here.

> I wonder how that is supposed to interact
> with clients that do not know about the "promisor-remote" capability at
> all though.

When that happens, S can fetch from X the objects it doesn't have, and
then proceed as usual to respond to the client. This has the drawback
of duplicating these objects on S, but perhaps there could be some
kind of garbage collection process that would regularly remove those
duplicated objects from S.

Another possibility that could be added in the future would be for S
to warn the client that it should be upgraded to have the
"promisor-remote" capability. Or S could just refuse to serve the
client in that case. I don't think we should implement these
possibilities right now, but it could be useful to do it in the
future.

> From my point of view the server should be able tot handle that just
> fine and provide a full packfile to the client. That would of course
> require the server to fetch missing objects from its own promisor
> remotes.

It's what already happens.

> Do we want to state explicitly that this is a MUST for servers
> so that we don't end up in a future where clients wouldn't be able to
> fetch from some forges anymore?

I don't think we should enforce anything like this. For example in
corporate setups, it might be easy to install the latest version of
Git and it might be a good thing to make sure the server doesn't get
overloaded with large files when they are supposed to only be stored
on a promisor remote.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux