> On May 13, 2021, at 11:13 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote: >> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote: >>> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia < >>> olga.kornievskaia@xxxxxxxxx> wrote: >>> >>>> On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia >>>> <olga.kornievskaia@xxxxxxxxx> wrote: >>>>> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@xxxxxxxxxxxx> >>>>> wrote: >>>>>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia >>>>>> wrote: >>>>>>> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni < >>>>>>> dan@xxxxxxxxxxxx> wrote: >>>>>>>> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga >>>>>>>> Kornievskaia wrote: >>>>>>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx> >>>>>>>>> >>>>>>>>> An rpc client uses a transport switch and one ore more >>>>>>>>> transports >>>>>>>>> associated with that switch. Since transports are >>>>>>>>> shared among >>>>>>>>> rpc clients, create a symlink into the xprt_switch >>>>>>>>> directory >>>>>>>>> instead of duplicating entries under each rpc client. >>>>>>>>> >>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> >>>>>>>>> >>>>>>>>> .. >>>>>>>>> @@ -188,6 +204,11 @@ void >>>>>>>>> rpc_sysfs_client_destroy(struct >>>> rpc_clnt *clnt) >>>>>>>>> struct rpc_sysfs_client *rpc_client = clnt- >>>>>>>>>> cl_sysfs; >>>>>>>>> >>>>>>>>> if (rpc_client) { >>>>>>>>> + char name[23]; >>>>>>>>> + >>>>>>>>> + snprintf(name, sizeof(name), "switch-%d", >>>>>>>>> + rpc_client->xprt_switch- >>>>>>>>>> xps_id); >>>>>>>>> + sysfs_remove_link(&rpc_client->kobject, >>>>>>>>> name); >>>>>>>> >>>>>>>> Hi Olga, >>>>>>>> >>>>>>>> If a client can use a single switch, shouldn't the name >>>>>>>> of the >>>> symlink >>>>>>>> be just "switch"? This is to be consistent with other >>>>>>>> symlinks in >>>>>>>> `sysfs` such as the ones in block layer, for example in >>>>>>>> my >>>>>>>> `/sys/block/sda`: >>>>>>>> >>>>>>>> bdi -> >>>>>>>> ../../../../../../../../../../../virtual/bdi/8:0 >>>>>>>> device -> ../../../5:0:0:0 >>>>>>> >>>>>>> I think the client is written so that in the future it >>>>>>> might have more >>>>>>> than one switch? >>>>>> >>>>>> I wonder what would be the use for that, as a switch is >>>>>> already >>>> collection of >>>>>> xprts. Which would determine the switch to use per a new task >>>>>> IO? >>>>> >>>>> >>>>> I thought the switch is a collection of xprts of the same type. >>>>> And if >>>> you wanted to have an RDMA connection and a TCP connection to the >>>> same >>>> server, then it would be stored under different switches? For >>>> instance we >>>> round-robin thru the transports but I don't see why we would be >>>> doing so >>>> between a TCP and an RDMA transport. But I see how a client can >>>> totally >>>> switch from an TCP based transport to an RDMA one (or a set of >>>> transports >>>> and round-robin among that set). But perhaps I'm wrong in how I'm >>>> thinking >>>> about xprt_switch and multipathing. >>>> >>>> <looks like my reply bounced so trying to resend> >>>> >>> >>> And more to answer your question, we don't have a method to switch >>> between >>> different xprt switches. But we don't have a way to specify how to >>> mount >>> with multiple types of transports. Perhaps sysfs could be/would be >>> a way to >>> switch between the two. Perhaps during session trunking discovery, >>> the >>> server can return back a list of IPs and types of transports. Say >>> all IPs >>> would be available via TCP and RDMA, then the client can create a >>> switch >>> with RDMA transports and another with TCP transports, then perhaps >>> there >>> will be a policy engine that would decide which one to choose to >>> use to >>> begin with. And then with sysfs interface would be a way to switch >>> between >>> the two if there are problems. >> >> You raise a good point, also relevant to the ability to dynamically >> add >> new transports on the fly with the sysfs interface - their protocol >> type >> may be different. >> >> Returning to the topic of multiple switches per client, I recall that >> a >> few times it was hinted that there is an intention to have the >> implementation details of xprtswitch be modified to be loadable and >> pluggable with custom algorithms. And if we are going in that >> direction, I'd expect the advanced transport management and request >> routing can be below the RPC client level, where we have example >> uses: >> >> 1) Optimizations in request routing that I've previously written >> about >> (based on request data memory). >> >> 2) If we lift the restriction of multiple protocol types on the same >> xprtswitch on one switch, then we can also allow for the >> implementation >> 'RDMA-by-default with TCP failover on standby' similar to what you >> suggest, but with one switch maintaining two lists behind the scenes. >> > > I'm not that interested in supporting multiple switches per client, or > any setup that is asymmetric w.r.t. transport characteristics at this > time. > > Any such setup is going to need a policy engine in order to decide > which RPC calls can be placed on which set of transports. That again > will end up adding a lot of complexity in the kernel itself. I've yet > to see any compelling justification for doing so. I agree -- SMB multi-channel allows TCP+RDMA configurations, and its tough to decide how to distribute work across connections and NICs that have such vastly different performance characteristics. I would like to see us crawling and walking before trying to run. -- Chuck Lever