Re: [PATCH v1 13/16] NFS: Add sidecar RPC client support

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 22 Oct 2014 13:20:03 -0400

> On Oct 22, 2014, at 4:39 AM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> 
>> On Tue, Oct 21, 2014 at 8:11 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>>> On Oct 21, 2014, at 3:45 AM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
>>> 
>>>> On Tue, Oct 21, 2014 at 4:06 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> There is no show-stopper (see Section 5.1, after all). It’s
>>>> simply a matter of development effort: a side-car is much
>>>> less work than implementing full RDMA backchannel support for
>>>> both a client and server, especially since TCP backchannel
>>>> already works and can be used immediately.
>>>> 
>>>> Also, no problem with eventually implementing RDMA backchannel
>>>> if the complexity, and any performance overhead it introduces in
>>>> the forward channel, can be justified. The client can use the
>>>> CREATE_SESSION flags to detect what a server supports.
>>> 
>>> What complexity and performance overhead does it introduce in the
>>> forward channel?
>> 
>> The benefit of RDMA is that there are opportunities to
>> reduce host CPU interaction with incoming data.
>> Bi-direction requires that the transport look at the RPC
>> header to determine the direction of the message. That
>> could have an impact on the forward channel, but it’s
>> never been measured, to my knowledge.
>> 
>> The reason this is more of an issue for RPC/RDMA is that
>> a copy of the XID appears in the RPC/RDMA header to avoid
>> the need to look at the RPC header. That’s typically what
>> implementations use to steer RPC reply processing.
>> 
>> Often the RPC/RDMA header and RPC header land in
>> disparate buffers. The RPC/RDMA reply handler looks
>> strictly at the RPC/RDMA header, and runs in a tasklet
>> usually on a different CPU. Adding bi-direction would mean
>> the transport would have to peek into the upper layer
>> headers, possibly resulting in cache line bouncing.
> 
> Under what circumstances would you expect to receive a valid NFSv4.1
> callback with an RDMA header that spans multiple cache lines?

The RPC header and RPC/RDMA header are separate entities, but
together can span multiple cache lines if the server has returned a
chunk list containing multiple entries.

For example, RDMA_NOMSG would send the RPC/RDMA header
via RDMA SEND with a chunk list that represents the RPC and NFS
payload. That list could make the header larger than 32 bytes.

I expect that any callback that involves more than 1024 byte of
RPC payload will need to use RDMA_NOMSG. A long device
info list might fit that category?

>> The complexity would be the addition of over a hundred
>> new lines of code on the client, and possibly a similar
>> amount of new code on the server. Small, perhaps, but
>> not insignificant.
> 
> Until there are RDMA users, I care a lot less about code changes to
> xprtrdma than to NFS.
> 
>>>>> 2) Why do we instead have to solve the whole backchannel problem in
>>>>> the NFSv4.1 layer, and where is the discussion of the merits for and
>>>>> against that particular solution? As far as I can tell, it imposes at
>>>>> least 2 extra requirements:
>>>>> a) NFSv4.1 client+server must have support either for session
>>>>> trunking or for clientid trunking
>>>> 
>>>> Very minimal trunking support. The only operation allowed on
>>>> the TCP side-car's forward channel is BIND_CONN_TO_SESSION.
>>>> 
>>>> Bruce told me that associating multiple transports to a
>>>> clientid/session should not be an issue for his server (his
>>>> words were “if that doesn’t work, it’s a bug”).
>>>> 
>>>> Would this restrictive form of trunking present a problem?
>>>> 
>>>>> b) NFSv4.1 client must be able to set up a TCP connection to the
>>>>> server (that can be session/clientid trunked with the existing RDMA
>>>>> channel)
>>>> 
>>>> Also very minimal changes. The changes are already done,
>>>> posted in v1 of this patch series.
>>> 
>>> I'm not asking for details on the size of the changesets, but for a
>>> justification of the design itself.
>> 
>> The size of the changeset _is_ the justification. It’s
>> a much less invasive change to add a TCP side-car than
>> it is to implement RDMA backchannel on both server and
>> client.
> 
> Please define your use of the word "invasive" in the above context. To
> me "invasive" means "will affect code that is in use by others".

The server side, then, is non-invasive. The client side makes minor
changes to state management.

> 
>> Most servers would require almost no change. Linux needs
>> only a bug fix or two. Effectively zero-impact for
>> servers that already support NFSv4.0 on RDMA to get
>> NFSv4.1 and pNFS on RDMA, with working callbacks.
>> 
>> That’s really all there is to it. It’s almost entirely a
>> practical consideration: we have the infrastructure and
>> can make it work in just a few lines of code.
>> 
>>> If it is possible to confine all
>>> the changes to the RPC/RDMA layer, then why consider patches that
>>> change the NFSv4.1 layer at all?
>> 
>> The fast new transport bring-up benefit is probably the
>> biggest win. A TCP side-car makes bringing up any new
>> transport implementation simpler.
> 
> That's an assertion that assumes:
> - we actually want to implement more transports aside from RDMA

So you no longer consider RPC/SCTP a possibility?

> - implementing bi-directional transports in the RPC layer is non-simple

I don't care to generalize about that. In the RPC/RDMA case, there
are some complications that make it non-simple, but not impossible.
So we have an example of a non-simple case, IMO.

> Right now, the benefit is only to RDMA users. Nobody else is asking
> for such a change.
> 
>> And, RPC/RDMA offers zero performance benefit for
>> backchannel traffic, especially since CB traffic would
>> never move via RDMA READ/WRITE (as per RFC 5667 section
>> 5.1).
>> 
>> The primary benefit to doing an RPC/RDMA-only solution
>> is that there is no upper layer impact. Is that a design
>> requirement?

Based on your objections, it appears that "no upper layer
impact" is a hard design requirement. I will take this as a
NACK for the side-car approach.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html