Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Nov 23, 2015, at 8:22 PM, Tom Talpey <tom@xxxxxxxxxx> wrote:
> 
> On 11/23/2015 8:16 PM, Chuck Lever wrote:
>> 
>>> On Nov 23, 2015, at 7:55 PM, Tom Talpey <tom@xxxxxxxxxx> wrote:
>>> 
>>> On 11/23/2015 5:13 PM, Chuck Lever wrote:
>>>> Rarely, senders post a Send that is larger than the client's inline
>>>> threshold. That can be due to a bug, or the client and server may
>>>> not have communicated about their inline limits. RPC-over-RDMA
>>>> currently doesn't specify any particular limit on inline size, so
>>>> peers have to guess what it is.
>>>> 
>>>> It is fatal to the connection if the size of a Send is larger than
>>>> the client's receive buffer. The sender is likely to retry with the
>>>> same message size, so the workload is stuck at that point.
>>>> 
>>>> Follow Postel's robustness principal: Be conservative in what you
>>>> do, be liberal in what you accept from others. Increase the size of
>>>> client receive buffers by a safety margin, and add a warning when
>>>> the inline threshold is exceeded during receive.
>>> 
>>> Safety is good, but how do know the chosen value is enough?
>>> Isn't it better to fail the badly-composed request and be done
>>> with it? Even if the stupid sender loops, which it will do
>>> anyway.
>> 
>> It’s good enough to compensate for the most common sender bug,
>> which is that the sender did not account for the 28 bytes of
>> the RPC-over-RDMA header when it built the send buffer. The
>> additional 100 byte margin is gravy.
> 
> I think it's good to have sympathy and resilience to differing
> designs on the other end of the wire, but I fail to have it for
> stupid bugs. Unless this can take down the receiver, fail it fast.
> 
> MHO.

See above: the client can’t tell why it’s failed.

Again, the Send on the server side fails with LOCAL_LEN_ERR
and the connection is terminated. The client sees only the
connection loss. There’s no distinction between this and a
cable pull or a server crash, where you do want the client
to retransmit.

I agree it’s a dumb server bug. But the idea is to preserve
the connection, since NFS retransmits are a hazard.

Just floating this idea here, this is v1. This one can be
dropped.


>> The loop occurs because the server gets a completion error.
>> The client just sees a connection loss. There’s no way for it
>> to know it should fail the RPC, so it keeps trying.
>> 
>> Perhaps the server could remember that the reply failed, and
>> when the client retransmits, it can simply return that XID
>> with an RDMA_ERROR.
>> 
>> 
>>>> Note the Linux server's receive buffers are already page-sized.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>>  net/sunrpc/xprtrdma/rpc_rdma.c  |    7 +++++++
>>>>  net/sunrpc/xprtrdma/verbs.c     |    6 +++++-
>>>>  net/sunrpc/xprtrdma/xprt_rdma.h |    5 +++++
>>>>  3 files changed, 17 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> index c10d969..a169252 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>>  	int rdmalen, status;
>>>>  	unsigned long cwnd;
>>>>  	u32 credits;
>>>> +	RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata);
>>>> 
>>>>  	dprintk("RPC:       %s: incoming rep %p\n", __func__, rep);
>>>> 
>>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep)
>>>>  		goto out_badstatus;
>>>>  	if (rep->rr_len < RPCRDMA_HDRLEN_MIN)
>>>>  		goto out_shortreply;
>>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
>>>> +	cdata = &r_xprt->rx_data;
>>>> +	if (rep->rr_len > cdata->inline_rsize)
>>>> +		pr_warn("RPC: %u byte reply exceeds inline threshold\n",
>>>> +			rep->rr_len);
>>>> +#endif
>>>> 
>>>>  	headerp = rdmab_to_msg(rep->rr_rdmabuf);
>>>>  	if (headerp->rm_vers != rpcrdma_version)
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>>> index eadd1655..e3f12e2 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
>>>>  	if (rep == NULL)
>>>>  		goto out;
>>>> 
>>>> -	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize,
>>>> +	/* The actual size of our receive buffers is increased slightly
>>>> +	 * to prevent small receive overruns from killing our connection.
>>>> +	 */
>>>> +	rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize +
>>>> +					       RPCRDMA_RECV_MARGIN,
>>>>  					       GFP_KERNEL);
>>>>  	if (IS_ERR(rep->rr_rdmabuf)) {
>>>>  		rc = PTR_ERR(rep->rr_rdmabuf);
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> index ac7f8d4..1b72ab1 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal {
>>>>  #define RPCRDMA_INLINE_PAD_VALUE(rq)\
>>>>  	rpcx_to_rdmad(rq->rq_xprt).padding
>>>> 
>>>> +/* To help prevent spurious connection shutdown, allow senders to
>>>> + * overrun our receive inline threshold by a small bit.
>>>> + */
>>>> +#define RPCRDMA_RECV_MARGIN	(128)
>>>> +
>>>>  /*
>>>>   * Statistics for RPCRDMA
>>>>   */
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 

--
Chuck Lever




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux