Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()

Chuck Lever <chuck.lever@xxxxxxxxxx> · Tue, 4 Dec 2018 15:04:22 -0500

> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@xxxxxxxxx> wrote:
> 
> On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
>> Test report.
>> 
>> I applied this on top of the three v4 patches from Friday.
>> 
>> With TCP, I no longer see the soft IRQ warnings or bdev leaks.
>> However, at some point during my test (sec=krb5p,vers=3) the
>> mount stops responding and the client becomes idle. ^C leaves
>> the iozone processes in D state.
> 
> Does /proc/*/stack show any tasks with any interesting stack behaviour?
> 
>> 
>> With RDMA, I can see a few issues:
>> - A lot of -74 and server disconnects. I suspect this is due
>>   to GSS sequence window overruns
>> - xprt_connect puts the calling task to sleep if the credit
>>   limit has been reached, and that can result in a deadlock
>> - The reconnect logic appears to busy-wait until something
>>   else (eg, an RTO) forces a reconnect
>> - A connect or disconnect wake-up that occurs while an RPC
>>   reply is being handled (ie, is still on xprt->pending)
>>   results in that RPC being retried unnecessarily.
>> 
>> I'm not sure how to address that last one. Sometimes RDMA has
>> to invalidate MRs, and that involves a wait/context swith which
>> opens the race window significantly.
> 
> In principle the latter issue is supposed to be handled by the connect
> cookies. Are the updates perhaps being ordered incorrectly w.r.t. the
> xprt->pending wakeup? If so, then that could cause races.
> 
> I'd expect the cookie would need to be updated before the call to xprt-
> pending both on connect and disconnect.

xprtrdma does this in rpcrdma_cm_event_handler.

I'm at a loss to understand how connect cookies can help. During a
disconnect, rpc_wake_pending_tasks sets tk_status to -EAGAIN, but a
subsequent xprt_complete_rqst does not set tk_status. When
xprt_complete_rqst wakes up the waiting RPC task, that task sees
-EAGAIN rather than a successful RPC completion.

>>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@xxxxxxxxx>
>>> wrote:
>>> 
>>> If an asynchronous connection attempt completes while another task
>>> is
>>> in xprt_connect(), then the call to rpc_sleep_on() could end up
>>> racing with the call to xprt_wake_pending_tasks().
>>> So add a second test of the connection state after we've put the
>>> task to sleep and set the XPRT_CONNECTING flag, when we know that
>>> there
>>> can be no asynchronous connection attempts still in progress.
>>> 
>>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING
>>> into...")
>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>> ---
>>> net/sunrpc/xprt.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>>> index 122c91c28e7c..ce927002862a 100644
>>> --- a/net/sunrpc/xprt.c
>>> +++ b/net/sunrpc/xprt.c
>>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
>>> 			return;
>>> 		if (xprt_test_and_set_connecting(xprt))
>>> 			return;
>>> -		xprt->stat.connect_start = jiffies;
>>> -		xprt->ops->connect(xprt, task);
>>> +		/* Race breaker */
>>> +		if (!xprt_connected(xprt)) {
>>> +			xprt->stat.connect_start = jiffies;
>>> +			xprt->ops->connect(xprt, task);
>>> +		} else {
>>> +			xprt_clear_connecting(xprt);
>>> +			task->tk_status = 0;
>>> +			rpc_wake_up_queued_task(&xprt->pending, task);
>>> +		}
>>> 	}
>>> 	xprt_release_write(xprt, task);
>>> }
>>> -- 
>>> 2.19.2
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx

--
Chuck Lever