Re: [PATCH v3 09/13] sunrpc: add a symlink from rpc-client directory to the xprt_switch

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 13 May 2021 15:13:25 +0000

On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> > On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> > olga.kornievskaia@xxxxxxxxx> wrote:
> > 
> > > On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> > > <olga.kornievskaia@xxxxxxxxx> wrote:
> > > > On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@xxxxxxxxxxxx>
> > > > wrote:
> > > > > On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
> > > > > wrote:
> > > > > > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
> > > > > > dan@xxxxxxxxxxxx> wrote:
> > > > > > > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
> > > > > > > Kornievskaia wrote:
> > > > > > > > From: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > > > > > > > 
> > > > > > > > An rpc client uses a transport switch and one ore more
> > > > > > > > transports
> > > > > > > > associated with that switch. Since transports are
> > > > > > > > shared among
> > > > > > > > rpc clients, create a symlink into the xprt_switch
> > > > > > > > directory
> > > > > > > > instead of duplicating entries under each rpc client.
> > > > > > > > 
> > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > > > > > > > 
> > > > > > > > ..
> > > > > > > > @@ -188,6 +204,11 @@ void
> > > > > > > > rpc_sysfs_client_destroy(struct
> > > rpc_clnt *clnt)
> > > > > > > >       struct rpc_sysfs_client *rpc_client = clnt-
> > > > > > > > >cl_sysfs;
> > > > > > > > 
> > > > > > > >       if (rpc_client) {
> > > > > > > > +             char name[23];
> > > > > > > > +
> > > > > > > > +             snprintf(name, sizeof(name), "switch-%d",
> > > > > > > > +                      rpc_client->xprt_switch-
> > > > > > > > >xps_id);
> > > > > > > > +             sysfs_remove_link(&rpc_client->kobject,
> > > > > > > > name);
> > > > > > > 
> > > > > > > Hi Olga,
> > > > > > > 
> > > > > > > If a client can use a single switch, shouldn't the name
> > > > > > > of the
> > > symlink
> > > > > > > be just "switch"? This is to be consistent with other
> > > > > > > symlinks in
> > > > > > > `sysfs` such as the ones in block layer, for example in
> > > > > > > my
> > > > > > > `/sys/block/sda`:
> > > > > > > 
> > > > > > >     bdi ->
> > > > > > > ../../../../../../../../../../../virtual/bdi/8:0
> > > > > > >     device -> ../../../5:0:0:0
> > > > > > 
> > > > > > I think the client is written so that in the future it
> > > > > > might have more
> > > > > > than one switch?
> > > > > 
> > > > > I wonder what would be the use for that, as a switch is
> > > > > already
> > > collection of
> > > > > xprts. Which would determine the switch to use per a new task
> > > > > IO?
> > > > 
> > > > 
> > > > I thought the switch is a collection of xprts of the same type.
> > > > And if
> > > you wanted to have an RDMA connection and a TCP connection to the
> > > same
> > > server, then it would be stored under different switches? For
> > > instance we
> > > round-robin thru the transports but I don't see why we would be
> > > doing so
> > > between a TCP and an RDMA transport. But I see how a client can
> > > totally
> > > switch from an TCP based transport to an RDMA one (or a set of
> > > transports
> > > and round-robin among that set). But perhaps I'm wrong in how I'm
> > > thinking
> > > about xprt_switch and multipathing.
> > > 
> > > <looks like my reply bounced so trying to resend>
> > > 
> > 
> > And more to answer your question, we don't have a method to switch
> > between
> > different xprt switches. But we don't have a way to specify how to
> > mount
> > with multiple types of transports. Perhaps sysfs could be/would be
> > a way to
> > switch between the two. Perhaps during session trunking discovery,
> > the
> > server can return back a list of IPs and types of transports. Say
> > all IPs
> > would be available via TCP and RDMA, then the client can create a
> > switch
> > with RDMA transports and another with TCP transports, then perhaps
> > there
> > will be a policy engine that would decide which one to choose to
> > use to
> > begin with. And then with sysfs interface would be a way to switch
> > between
> > the two if there are problems.
> 
> You raise a good point, also relevant to the ability to dynamically
> add
> new transports on the fly with the sysfs interface - their protocol
> type
> may be different.
> 
> Returning to the topic of multiple switches per client, I recall that
> a
> few times it was hinted that there is an intention to have the
> implementation details of xprtswitch be modified to be loadable and
> pluggable with custom algorithms.  And if we are going in that
> direction, I'd expect the advanced transport management and request
> routing can be below the RPC client level, where we have example
> uses:
> 
> 1) Optimizations in request routing that I've previously written
> about
> (based on request data memory).
> 
> 2) If we lift the restriction of multiple protocol types on the same
> xprtswitch on one switch, then we can also allow for the
> implementation
> 'RDMA-by-default with TCP failover on standby' similar to what you
> suggest, but with one switch maintaining two lists behind the scenes.
> 

I'm not that interested in supporting multiple switches per client, or
any setup that is asymmetric w.r.t. transport characteristics at this
time.

Any such setup is going to need a policy engine in order to decide
which RPC calls can be placed on which set of transports. That again
will end up adding a lot of complexity in the kernel itself. I've yet
to see any compelling justification for doing so.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx