Re: [PATCH v3 09/13] sunrpc: add a symlink from rpc-client directory to the xprt_switch

Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> · Thu, 13 May 2021 11:53:36 -0400

On Thu, May 13, 2021 at 11:18 AM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On May 13, 2021, at 11:13 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
> >> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
> >>> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
> >>> olga.kornievskaia@xxxxxxxxx> wrote:
> >>>
> >>>> On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
> >>>> <olga.kornievskaia@xxxxxxxxx> wrote:
> >>>>> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@xxxxxxxxxxxx>
> >>>>> wrote:
> >>>>>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
> >>>>>> wrote:
> >>>>>>> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
> >>>>>>> dan@xxxxxxxxxxxx> wrote:
> >>>>>>>> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
> >>>>>>>> Kornievskaia wrote:
> >>>>>>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx>
> >>>>>>>>>
> >>>>>>>>> An rpc client uses a transport switch and one ore more
> >>>>>>>>> transports
> >>>>>>>>> associated with that switch. Since transports are
> >>>>>>>>> shared among
> >>>>>>>>> rpc clients, create a symlink into the xprt_switch
> >>>>>>>>> directory
> >>>>>>>>> instead of duplicating entries under each rpc client.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
> >>>>>>>>>
> >>>>>>>>> ..
> >>>>>>>>> @@ -188,6 +204,11 @@ void
> >>>>>>>>> rpc_sysfs_client_destroy(struct
> >>>> rpc_clnt *clnt)
> >>>>>>>>>       struct rpc_sysfs_client *rpc_client = clnt-
> >>>>>>>>>> cl_sysfs;
> >>>>>>>>>
> >>>>>>>>>       if (rpc_client) {
> >>>>>>>>> +             char name[23];
> >>>>>>>>> +
> >>>>>>>>> +             snprintf(name, sizeof(name), "switch-%d",
> >>>>>>>>> +                      rpc_client->xprt_switch-
> >>>>>>>>>> xps_id);
> >>>>>>>>> +             sysfs_remove_link(&rpc_client->kobject,
> >>>>>>>>> name);
> >>>>>>>>
> >>>>>>>> Hi Olga,
> >>>>>>>>
> >>>>>>>> If a client can use a single switch, shouldn't the name
> >>>>>>>> of the
> >>>> symlink
> >>>>>>>> be just "switch"? This is to be consistent with other
> >>>>>>>> symlinks in
> >>>>>>>> `sysfs` such as the ones in block layer, for example in
> >>>>>>>> my
> >>>>>>>> `/sys/block/sda`:
> >>>>>>>>
> >>>>>>>>     bdi ->
> >>>>>>>> ../../../../../../../../../../../virtual/bdi/8:0
> >>>>>>>>     device -> ../../../5:0:0:0
> >>>>>>>
> >>>>>>> I think the client is written so that in the future it
> >>>>>>> might have more
> >>>>>>> than one switch?
> >>>>>>
> >>>>>> I wonder what would be the use for that, as a switch is
> >>>>>> already
> >>>> collection of
> >>>>>> xprts. Which would determine the switch to use per a new task
> >>>>>> IO?
> >>>>>
> >>>>>
> >>>>> I thought the switch is a collection of xprts of the same type.
> >>>>> And if
> >>>> you wanted to have an RDMA connection and a TCP connection to the
> >>>> same
> >>>> server, then it would be stored under different switches? For
> >>>> instance we
> >>>> round-robin thru the transports but I don't see why we would be
> >>>> doing so
> >>>> between a TCP and an RDMA transport. But I see how a client can
> >>>> totally
> >>>> switch from an TCP based transport to an RDMA one (or a set of
> >>>> transports
> >>>> and round-robin among that set). But perhaps I'm wrong in how I'm
> >>>> thinking
> >>>> about xprt_switch and multipathing.
> >>>>
> >>>> <looks like my reply bounced so trying to resend>
> >>>>
> >>>
> >>> And more to answer your question, we don't have a method to switch
> >>> between
> >>> different xprt switches. But we don't have a way to specify how to
> >>> mount
> >>> with multiple types of transports. Perhaps sysfs could be/would be
> >>> a way to
> >>> switch between the two. Perhaps during session trunking discovery,
> >>> the
> >>> server can return back a list of IPs and types of transports. Say
> >>> all IPs
> >>> would be available via TCP and RDMA, then the client can create a
> >>> switch
> >>> with RDMA transports and another with TCP transports, then perhaps
> >>> there
> >>> will be a policy engine that would decide which one to choose to
> >>> use to
> >>> begin with. And then with sysfs interface would be a way to switch
> >>> between
> >>> the two if there are problems.
> >>
> >> You raise a good point, also relevant to the ability to dynamically
> >> add
> >> new transports on the fly with the sysfs interface - their protocol
> >> type
> >> may be different.
> >>
> >> Returning to the topic of multiple switches per client, I recall that
> >> a
> >> few times it was hinted that there is an intention to have the
> >> implementation details of xprtswitch be modified to be loadable and
> >> pluggable with custom algorithms.  And if we are going in that
> >> direction, I'd expect the advanced transport management and request
> >> routing can be below the RPC client level, where we have example
> >> uses:
> >>
> >> 1) Optimizations in request routing that I've previously written
> >> about
> >> (based on request data memory).
> >>
> >> 2) If we lift the restriction of multiple protocol types on the same
> >> xprtswitch on one switch, then we can also allow for the
> >> implementation
> >> 'RDMA-by-default with TCP failover on standby' similar to what you
> >> suggest, but with one switch maintaining two lists behind the scenes.
> >>
> >
> > I'm not that interested in supporting multiple switches per client, or
> > any setup that is asymmetric w.r.t. transport characteristics at this
> > time.
> >
> > Any such setup is going to need a policy engine in order to decide
> > which RPC calls can be placed on which set of transports. That again
> > will end up adding a lot of complexity in the kernel itself. I've yet
> > to see any compelling justification for doing so.
>
> I agree -- SMB multi-channel allows TCP+RDMA configurations, and its
> tough to decide how to distribute work across connections and NICs
> that have such vastly different performance characteristics.
>
> I would like to see us crawling and walking before trying to run.

Sounds good folks. I'll remove the multiple switches from the sysfs
infrastructure. v7 coming up.

>
>
> --
> Chuck Lever
>
>
>