Re: [PATCH v3 09/13] sunrpc: add a symlink from rpc-client directory to the xprt_switch

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Thu, 13 May 2021 15:18:25 +0000

> On May 13, 2021, at 11:13 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Wed, 2021-05-12 at 17:16 +0300, Dan Aloni wrote:
>> On Wed, May 12, 2021 at 09:49:01AM -0400, Olga Kornievskaia wrote:
>>> On Wed, May 12, 2021 at 9:40 AM Olga Kornievskaia <
>>> olga.kornievskaia@xxxxxxxxx> wrote:
>>> 
>>>> On Wed, May 12, 2021 at 9:37 AM Olga Kornievskaia
>>>> <olga.kornievskaia@xxxxxxxxx> wrote:
>>>>> On Wed, May 12, 2021 at 6:42 AM Dan Aloni <dan@xxxxxxxxxxxx>
>>>>> wrote:
>>>>>> On Tue, Apr 27, 2021 at 08:12:53AM -0400, Olga Kornievskaia
>>>>>> wrote:
>>>>>>> On Tue, Apr 27, 2021 at 12:42 AM Dan Aloni <
>>>>>>> dan@xxxxxxxxxxxx> wrote:
>>>>>>>> On Mon, Apr 26, 2021 at 01:19:43PM -0400, Olga
>>>>>>>> Kornievskaia wrote:
>>>>>>>>> From: Olga Kornievskaia <kolga@xxxxxxxxxx>
>>>>>>>>> 
>>>>>>>>> An rpc client uses a transport switch and one ore more
>>>>>>>>> transports
>>>>>>>>> associated with that switch. Since transports are
>>>>>>>>> shared among
>>>>>>>>> rpc clients, create a symlink into the xprt_switch
>>>>>>>>> directory
>>>>>>>>> instead of duplicating entries under each rpc client.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx>
>>>>>>>>> 
>>>>>>>>> ..
>>>>>>>>> @@ -188,6 +204,11 @@ void
>>>>>>>>> rpc_sysfs_client_destroy(struct
>>>> rpc_clnt *clnt)
>>>>>>>>>       struct rpc_sysfs_client *rpc_client = clnt-
>>>>>>>>>> cl_sysfs;
>>>>>>>>> 
>>>>>>>>>       if (rpc_client) {
>>>>>>>>> +             char name[23];
>>>>>>>>> +
>>>>>>>>> +             snprintf(name, sizeof(name), "switch-%d",
>>>>>>>>> +                      rpc_client->xprt_switch-
>>>>>>>>>> xps_id);
>>>>>>>>> +             sysfs_remove_link(&rpc_client->kobject,
>>>>>>>>> name);
>>>>>>>> 
>>>>>>>> Hi Olga,
>>>>>>>> 
>>>>>>>> If a client can use a single switch, shouldn't the name
>>>>>>>> of the
>>>> symlink
>>>>>>>> be just "switch"? This is to be consistent with other
>>>>>>>> symlinks in
>>>>>>>> `sysfs` such as the ones in block layer, for example in
>>>>>>>> my
>>>>>>>> `/sys/block/sda`:
>>>>>>>> 
>>>>>>>>     bdi ->
>>>>>>>> ../../../../../../../../../../../virtual/bdi/8:0
>>>>>>>>     device -> ../../../5:0:0:0
>>>>>>> 
>>>>>>> I think the client is written so that in the future it
>>>>>>> might have more
>>>>>>> than one switch?
>>>>>> 
>>>>>> I wonder what would be the use for that, as a switch is
>>>>>> already
>>>> collection of
>>>>>> xprts. Which would determine the switch to use per a new task
>>>>>> IO?
>>>>> 
>>>>> 
>>>>> I thought the switch is a collection of xprts of the same type.
>>>>> And if
>>>> you wanted to have an RDMA connection and a TCP connection to the
>>>> same
>>>> server, then it would be stored under different switches? For
>>>> instance we
>>>> round-robin thru the transports but I don't see why we would be
>>>> doing so
>>>> between a TCP and an RDMA transport. But I see how a client can
>>>> totally
>>>> switch from an TCP based transport to an RDMA one (or a set of
>>>> transports
>>>> and round-robin among that set). But perhaps I'm wrong in how I'm
>>>> thinking
>>>> about xprt_switch and multipathing.
>>>> 
>>>> <looks like my reply bounced so trying to resend>
>>>> 
>>> 
>>> And more to answer your question, we don't have a method to switch
>>> between
>>> different xprt switches. But we don't have a way to specify how to
>>> mount
>>> with multiple types of transports. Perhaps sysfs could be/would be
>>> a way to
>>> switch between the two. Perhaps during session trunking discovery,
>>> the
>>> server can return back a list of IPs and types of transports. Say
>>> all IPs
>>> would be available via TCP and RDMA, then the client can create a
>>> switch
>>> with RDMA transports and another with TCP transports, then perhaps
>>> there
>>> will be a policy engine that would decide which one to choose to
>>> use to
>>> begin with. And then with sysfs interface would be a way to switch
>>> between
>>> the two if there are problems.
>> 
>> You raise a good point, also relevant to the ability to dynamically
>> add
>> new transports on the fly with the sysfs interface - their protocol
>> type
>> may be different.
>> 
>> Returning to the topic of multiple switches per client, I recall that
>> a
>> few times it was hinted that there is an intention to have the
>> implementation details of xprtswitch be modified to be loadable and
>> pluggable with custom algorithms.  And if we are going in that
>> direction, I'd expect the advanced transport management and request
>> routing can be below the RPC client level, where we have example
>> uses:
>> 
>> 1) Optimizations in request routing that I've previously written
>> about
>> (based on request data memory).
>> 
>> 2) If we lift the restriction of multiple protocol types on the same
>> xprtswitch on one switch, then we can also allow for the
>> implementation
>> 'RDMA-by-default with TCP failover on standby' similar to what you
>> suggest, but with one switch maintaining two lists behind the scenes.
>> 
> 
> I'm not that interested in supporting multiple switches per client, or
> any setup that is asymmetric w.r.t. transport characteristics at this
> time.
> 
> Any such setup is going to need a policy engine in order to decide
> which RPC calls can be placed on which set of transports. That again
> will end up adding a lot of complexity in the kernel itself. I've yet
> to see any compelling justification for doing so.

I agree -- SMB multi-channel allows TCP+RDMA configurations, and its
tough to decide how to distribute work across connections and NICs
that have such vastly different performance characteristics.

I would like to see us crawling and walking before trying to run.

--
Chuck Lever