Re: [PATCH v3] SUNRPC: Fix a race in the receive code path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Dec 14, 2017, at 4:33 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Thu, 2017-12-14 at 15:59 -0500, Chuck Lever wrote:
>>> On Dec 14, 2017, at 3:37 PM, Trond Myklebust <trondmy@primarydata.c
>>> om> wrote:
>>> 
>>> On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote:
>>>>> On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primaryda
>>>>> ta.c
>>>>> om> wrote:
>>>>> 
>>>>> Does the RDMA code update the connect cookie when the
>>>>> connection
>>>>> breaks? It looks to me as if it only does that when the
>>>>> connection
>>>>> is
>>>>> re-established. We really want both.
>>>>> 
>>>>>> I imagine this issue could also impact write buffer
>>>>>> exhaustion
>>>>>> on TCP.
>>>>> 
>>>>> See net/sunrpc/xprtsock.c:xs_tcp_state_change()
>>>> 
>>>> xprtrdma manipulates the connect_cookie in its connect worker,
>>>> see rpcrdma_connect_worker. This was added by:
>>>> 
>>>> commit 575448bd36208f99fe0dd554a43518d798966740
>>>> Author:     Tom Talpey <talpey@xxxxxxxxxx>
>>>> AuthorDate: Thu Oct 9 15:00:40 2008 -0400
>>>> Commit:     Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>>>> CommitDate: Fri Oct 10 15:10:36 2008 -0400
>>>> 
>>>>   RPC/RDMA: suppress retransmit on RPC/RDMA clients.
>>>> 
>>>> Would it be more correct to bump the cookie in
>>>> rpcrdma_conn_upcall,
>>>> which is the equivalent to xs_tcp_state_change? (if so, why, so
>>>> I can compose a reasonable patch description)
>>>> 
>>>> It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the
>>>> RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure
>>>> RDMA provides a distinction between "server disconnected"
>>>> and "client disconnected" although that probably does not
>>>> matter for this purpose.
>>>> 
>>>> But, why would the additional cookie update help? The transport
>>>> is not disconnecting before the deadlock.
>>>> 
>>> 
>>> The connection cookie's purpose is twofold:
>>> 
>>> 1) It tracks whether or not a request has been transmitted on the
>>> current connection or not.
>> 
>> That's broken by setting the cookie unconditionally outside
>> the transport_lock, isn't it?
>> 
>> 
>>> 2) It ensures that when several requests with the same connection
>>> cookie all call xprt_conditional_disconnect(), then that results in
>>> a
>>> single disconnection event. To do so, it assumes that
>>> xprt_autoclose()
>>> will change the cookie if the disconnection attempt is successful.
>>> 
>>> In TCP we do so in the xs_tcp_state_change(). If the RDMA transport
>>> can
>>> guarantee that the call to xprt->ops->close(xprt) is always
>>> successful,
>>> then you could do so there.
>> 
>> I don't mind moving the cookie bump to rpcrdma_conn_upcall,
>> but I'm not sure I understand the locking requirements.
>> 
>> Currently, xprt_transmit sets the connect_cookie while holding
>> the transport_lock.
>> 
>> xprt_conditional_disconnect compares the cookie while holding
>> the transport_lock.
>> 
>> For TCP, the transport_lock is held when bumping the cookie
>> in the ESTABLISHED case, but _not_ in the two CLOSE cases?
> 
> That should be OK. The networking layer should provide sufficient
> serialisation that we don't have to worry about collisions.
> 
>> 
>> xprtrdma holds the transport_lock when bumping the cookie,
>> which it does in its connect worker. It has to hold the lock
>> because it skips the value 0. xprtrdma needs to guarantee
>> that an RPC is never transmitted on the same connection
>> twice (and maybe it could use rq_connect_cookie instead of
>> its own cookie).
>> 
>> xprt_reserve_init is holding the reserve_lock but not the
>> transport_lock when it grabs the cookie. Maybe it should
>> not be initializing the rqst's cookie there?
>> 
>> Seems to me that xprt_transmit needs to update the rqst's
>> cookie while holding the transport_lock, especially if
>> xprtrdma needs to skip a cookie value? I'm sure I'm missing
>> something.
>> 
> 
> It should be OK, given that the connection is a state machine.
> However, I missed something that you said earlier about
> xprt_prepare_transmit().
> 
> OK. How about the following fixup patch instead of the earlier one?
> 
> 8<---------------------------------------------------
> From 21cdb2802d9d8b71553998e6be5aafeff0742142 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> Date: Thu, 14 Dec 2017 07:05:27 -0500
> Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path
> 
> ---
> net/sunrpc/xprt.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 5e4278e9ce37..33b74fd84051 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task)
> {
> 	struct rpc_rqst	*req = task->tk_rqstp;
> 	struct rpc_xprt	*xprt = req->rq_xprt;
> +	unsigned int connect_cookie;
> 	int status, numreqs;
> 
> 	dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen);
> @@ -1024,7 +1025,7 @@ void xprt_transmit(struct rpc_task *task)
> 	} else if (!req->rq_bytes_sent)
> 		return;
> 
> -	req->rq_connect_cookie = xprt->connect_cookie;
> +	connect_cookie = xprt->connect_cookie;
> 	req->rq_xtime = ktime_get();
> 	status = xprt->ops->send_request(task);
> 	trace_xprt_transmit(xprt, req->rq_xid, status);
> @@ -1050,6 +1051,7 @@ void xprt_transmit(struct rpc_task *task)
> 	xprt->stat.pending_u += xprt->pending.qlen;
> 	spin_unlock_bh(&xprt->transport_lock);
> 
> +	req->rq_connect_cookie = connect_cookie;
> 	if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) {
> 		/*
> 		 * Sleep on the pending queue if we're expecting a reply.
> -- 
> 2.14.3

No problems here, passed basic testing with NFSv4.0 on a
client with extra send_request fault injection.

I hope we can get the recv race fix (as updated here) and
the queue-work-on patch [1] into v4.15-rc.


--
Chuck Lever

[1] https://marc.info/?l=linux-nfs&m=151241427912572&w=2--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux