Re: [PATCH v3 09/13] sunrpc: add a symlink from rpc-client directory to the xprt_switch

Dan Aloni <dan@xxxxxxxxxxxx> · Wed, 12 May 2021 17:16:23 +0300

On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> olga.kornievskaia@xxxxxxxxx> wrote:
> 
> > On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> > <olga.kornievskaia@xxxxxxxxx> wrote:
> > > On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@xxxxxxxxxxxx> wrote:
> > >> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia wrote:
> > >> > On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <dan@xxxxxxxxxxxx> wrote:
> > >> > > On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga Kornievskaia wrote:
> > >> > > > From: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > >> > > >
> > >> > > > An rpc client uses a transport switch and one ore more transports
> > >> > > > associated with that switch. Since transports are shared among
> > >> > > > rpc clients, create a symlink into the xprt_switch directory
> > >> > > > instead of duplicating entries under each rpc client.
> > >> > > >
> > >> > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> > >> > > >
> > >> > > >..
> > >> > > > @@ -188,6 +204,11 @@ void rpc_sysfs_client_destroy(struct
> > rpc_clnt *clnt)
> > >> > > >       struct rpc_sysfs_client *rpc_client = clnt->cl_sysfs;
> > >> > > >
> > >> > > >       if (rpc_client) {
> > >> > > > +             char name[23];
> > >> > > > +
> > >> > > > +             snprintf(name, sizeof(name), "switch-%d",
> > >> > > > +                      rpc_client->xprt_switch->xps_id);
> > >> > > > +             sysfs_remove_link(&rpc_client->kobject, name);
> > >> > >
> > >> > > Hi Olga,
> > >> > >
> > >> > > If a client can use a single switch, shouldn't the name of the
> > symlink
> > >> > > be just "switch"? This is to be consistent with other symlinks in
> > >> > > `sysfs` such as the ones in block layer, for example in my
> > >> > > `/sys/block/sda`:
> > >> > >
> > >> > >     bdi -> ../../../../../../../../../../../virtual/bdi/8:0
> > >> > >     device -> ../../../5:0:0:0
> > >> >
> > >> > I think the client is written so that in the future it might have more
> > >> > than one switch?
> > >>
> > >> I wonder what would be the use for that, as a switch is already
> > collection of
> > >> xprts. Which would determine the switch to use per a new task IO?
> > >
> > >
> > > I thought the switch is a collection of xprts of the same type. And if
> > you wanted to have an RDMA connection and a TCP connection to the same
> > server, then it would be stored under different switches? For instance we
> > round-robin thru the transports but I don't see why we would be doing so
> > between a TCP and an RDMA transport. But I see how a client can totally
> > switch from an TCP based transport to an RDMA one (or a set of transports
> > and round-robin among that set). But perhaps I'm wrong in how I'm thinking
> > about xprt_switch and multipathing.
> >
> > <looks like my reply bounced so trying to resend>
> >
> 
> And more to answer your question, we don't have a method to switch between
> different xprt switches. But we don't have a way to specify how to mount
> with multiple types of transports. Perhaps sysfs could be/would be a way to
> switch between the two. Perhaps during session trunking discovery, the
> server can return back a list of IPs and types of transports. Say all IPs
> would be available via TCP and RDMA, then the client can create a switch
> with RDMA transports and another with TCP transports, then perhaps there
> will be a policy engine that would decide which one to choose to use to
> begin with. And then with sysfs interface would be a way to switch between
> the two if there are problems.

You raise a good point, also relevant to the ability to dynamically add
new transports on the fly with the sysfs interface - their protocol type
may be different.

Returning to the topic of multiple switches per client, I recall that a
few times it was hinted that there is an intention to have the
implementation details of xprtswitch be modified to be loadable and
pluggable with custom algorithms.  And if we are going in that
direction, I'd expect the advanced transport management and request
routing can be below the RPC client level, where we have example uses:

1) Optimizations in request routing that I've previously written about
(based on request data memory).

2) If we lift the restriction of multiple protocol types on the same
xprtswitch on one switch, then we can also allow for the implementation
'RDMA-by-default with TCP failover on standby' similar to what you
suggest, but with one switch maintaining two lists behind the scenes.

-- 
Dan Aloni