> On Dec 3, 2017, at 3:24 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Sun, 2017-12-03 at 15:19 -0500, Chuck Lever wrote: >>> On Dec 3, 2017, at 3:12 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxx >>> m> wrote: >>> >>> On Sun, 2017-12-03 at 13:54 -0500, Chuck Lever wrote: >>>>> On Dec 3, 2017, at 1:50 PM, Trond Myklebust <trond.myklebust@pr >>>>> imar >>>>> ydata.com> wrote: >>>>> >>>>> We must ensure that the call to rpc_sleep_on() in >>>>> xprt_transmit() >>>>> cannot >>>>> race with the call to xprt_complete_rqst(). >>>> >>>> :-( this will kill scalability, we might as well just go back >>>> to the old locking scheme. >>> >>> It shouldn't make a huge difference, but I agree that we do want to >>> get >>> rid of that transport lock. >>> >>> How about the following, then? >> >> This looks better, I'll give it a try! After testing for several hours, I was not able to reproduce the lost RPC completion symptom. I've been concerned about the performance impact. 8-thread dbench throughput regresses 2-3% on my 56Gb/s network. There will probably not be any noticeable degradation on a slower fabric. Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> >> But touching the recv_lock in the transmit path is what I'd like >> to avoid completely, if possible. I'm not clear on why the pending >> waitqueue is unprotected. Doesn't it have a lock of its own? > > The recv_lock ensures that the check of req->rq_reply_bytes_recvd is > atomic with the call to rpc_sleep_on(). Ah, makes sense. I agree it is an oversight in ce7c252a8c7 not to have made this change at the same time. > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > Date: Sun, 3 Dec 2017 13:37:27 -0500 > Subject: [PATCH v2] SUNRPC: Fix a race in the receive code path > > We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot > race with the call to xprt_complete_rqst(). IMHO the patch description could mention rq_reply_bytes_recvd, which is the data that is protected by the recv_lock. More bread crumbs for our future selves... > @@ -1048,19 +1049,22 @@ void xprt_transmit(struct rpc_task *task) > xprt->stat.sending_u += xprt->sending.qlen; > xprt->stat.pending_u += xprt->pending.qlen; > > - /* Don't race with disconnect */ > - if (!xprt_connected(xprt)) > - task->tk_status = -ENOTCONN; > - else { > + spin_unlock_bh(&xprt->transport_lock); > + > + if (!READ_ONCE(req->rq_reply_bytes_recvd) && rpc_reply_expected(task)) { I was wondering about this optimization. It seems to provide about a 1% benefit in dbench throughput results in my setup, though it is certainly not a common case that the reply has arrived at this point. > + spin_lock(&xprt->recv_lock); > /* > * Sleep on the pending queue since > * we're expecting a reply. > */ > - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) > + if (!req->rq_reply_bytes_recvd) { > rpc_sleep_on(&xprt->pending, task, xprt_timer); > - req->rq_connect_cookie = xprt->connect_cookie; > + /* Deal with disconnect races */ > + if (!xprt_connected(xprt)) > + xprt_wake_pending_tasks(xprt, -ENOTCONN); Here, the !xprt_connected test is no longer done unconditionally, but only when the task has to be put to sleep. Is that a 100% safe change? I'm also wondering if this check can be omitted. Won't disconnect wait until xprt_release_xprt ? Otherwise if this check is still needed, the new comment could be a little more explanatory :-) > + } > + spin_unlock(&xprt->recv_lock); > } > - spin_unlock_bh(&xprt->transport_lock); > } Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx> And thanks for your quick response! -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html