Re: [PATCH] SUNRPC: Fix a race in the receive code path

Chuck Lever <chuck.lever@xxxxxxxxxx> · Sun, 3 Dec 2017 18:33:39 -0500

> On Dec 3, 2017, at 3:24 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Sun, 2017-12-03 at 15:19 -0500, Chuck Lever wrote:
>>> On Dec 3, 2017, at 3:12 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxx
>>> m> wrote:
>>> 
>>> On Sun, 2017-12-03 at 13:54 -0500, Chuck Lever wrote:
>>>>> On Dec 3, 2017, at 1:50 PM, Trond Myklebust <trond.myklebust@pr
>>>>> imar
>>>>> ydata.com> wrote:
>>>>> 
>>>>> We must ensure that the call to rpc_sleep_on() in
>>>>> xprt_transmit()
>>>>> cannot
>>>>> race with the call to xprt_complete_rqst().
>>>> 
>>>> :-( this will kill scalability, we might as well just go back
>>>> to the old locking scheme.
>>> 
>>> It shouldn't make a huge difference, but I agree that we do want to
>>> get
>>> rid of that transport lock.
>>> 
>>> How about the following, then?
>> 
>> This looks better, I'll give it a try!

After testing for several hours, I was not able to reproduce the
lost RPC completion symptom.

I've been concerned about the performance impact. 8-thread dbench
throughput regresses 2-3% on my 56Gb/s network. There will probably
not be any noticeable degradation on a slower fabric.

Tested-by: Chuck Lever <chuck.lever@xxxxxxxxxx>

>> 
>> But touching the recv_lock in the transmit path is what I'd like
>> to avoid completely, if possible. I'm not clear on why the pending
>> waitqueue is unprotected. Doesn't it have a lock of its own?
> 
> The recv_lock ensures that the check of req->rq_reply_bytes_recvd is
> atomic with the call to rpc_sleep_on().

Ah, makes sense. I agree it is an oversight in ce7c252a8c7 not to
have made this change at the same time.

> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> Date: Sun, 3 Dec 2017 13:37:27 -0500
> Subject: [PATCH v2] SUNRPC: Fix a race in the receive code path
> 
> We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
> race with the call to xprt_complete_rqst().

IMHO the patch description could mention rq_reply_bytes_recvd, which
is the data that is protected by the recv_lock. More bread crumbs
for our future selves...

> @@ -1048,19 +1049,22 @@ void xprt_transmit(struct rpc_task *task)
> 	xprt->stat.sending_u += xprt->sending.qlen;
> 	xprt->stat.pending_u += xprt->pending.qlen;
> 
> -	/* Don't race with disconnect */
> -	if (!xprt_connected(xprt))
> -		task->tk_status = -ENOTCONN;
> -	else {
> +	spin_unlock_bh(&xprt->transport_lock);
> +
> +	if (!READ_ONCE(req->rq_reply_bytes_recvd) && rpc_reply_expected(task)) {

I was wondering about this optimization. It seems to provide about a
1% benefit in dbench throughput results in my setup, though it is
certainly not a common case that the reply has arrived at this point.

> +		spin_lock(&xprt->recv_lock);
> 		/*
> 		 * Sleep on the pending queue since
> 		 * we're expecting a reply.
> 		 */
> -		if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task))
> +		if (!req->rq_reply_bytes_recvd) {
> 			rpc_sleep_on(&xprt->pending, task, xprt_timer);
> -		req->rq_connect_cookie = xprt->connect_cookie;
> +			/* Deal with disconnect races */
> +			if (!xprt_connected(xprt))
> +				xprt_wake_pending_tasks(xprt, -ENOTCONN);

Here, the !xprt_connected test is no longer done unconditionally,
but only when the task has to be put to sleep. Is that a 100% safe
change?

I'm also wondering if this check can be omitted. Won't disconnect
wait until xprt_release_xprt ?

Otherwise if this check is still needed, the new comment could be
a little more explanatory :-)

> +		}
> +		spin_unlock(&xprt->recv_lock);
> 	}
> -	spin_unlock_bh(&xprt->transport_lock);
> }

Reviewed-by: Chuck Lever <chuck.lever@xxxxxxxxxx>

And thanks for your quick response!

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html