> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@xxxxxxxxx> wrote: > > On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote: >> Test report. >> >> I applied this on top of the three v4 patches from Friday. >> >> With TCP, I no longer see the soft IRQ warnings or bdev leaks. >> However, at some point during my test (sec=krb5p,vers=3) the >> mount stops responding and the client becomes idle. ^C leaves >> the iozone processes in D state. > > Does /proc/*/stack show any tasks with any interesting stack behaviour? > >> >> With RDMA, I can see a few issues: >> - A lot of -74 and server disconnects. I suspect this is due >> to GSS sequence window overruns >> - xprt_connect puts the calling task to sleep if the credit >> limit has been reached, and that can result in a deadlock >> - The reconnect logic appears to busy-wait until something >> else (eg, an RTO) forces a reconnect >> - A connect or disconnect wake-up that occurs while an RPC >> reply is being handled (ie, is still on xprt->pending) >> results in that RPC being retried unnecessarily. >> >> I'm not sure how to address that last one. Sometimes RDMA has >> to invalidate MRs, and that involves a wait/context swith which >> opens the race window significantly. > > In principle the latter issue is supposed to be handled by the connect > cookies. Are the updates perhaps being ordered incorrectly w.r.t. the > xprt->pending wakeup? If so, then that could cause races. > > I'd expect the cookie would need to be updated before the call to xprt- > pending both on connect and disconnect. xprtrdma does this in rpcrdma_cm_event_handler. I'm at a loss to understand how connect cookies can help. During a disconnect, rpc_wake_pending_tasks sets tk_status to -EAGAIN, but a subsequent xprt_complete_rqst does not set tk_status. When xprt_complete_rqst wakes up the waiting RPC task, that task sees -EAGAIN rather than a successful RPC completion. >>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@xxxxxxxxx> >>> wrote: >>> >>> If an asynchronous connection attempt completes while another task >>> is >>> in xprt_connect(), then the call to rpc_sleep_on() could end up >>> racing with the call to xprt_wake_pending_tasks(). >>> So add a second test of the connection state after we've put the >>> task to sleep and set the XPRT_CONNECTING flag, when we know that >>> there >>> can be no asynchronous connection attempts still in progress. >>> >>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING >>> into...") >>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>> --- >>> net/sunrpc/xprt.c | 11 +++++++++-- >>> 1 file changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>> index 122c91c28e7c..ce927002862a 100644 >>> --- a/net/sunrpc/xprt.c >>> +++ b/net/sunrpc/xprt.c >>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task) >>> return; >>> if (xprt_test_and_set_connecting(xprt)) >>> return; >>> - xprt->stat.connect_start = jiffies; >>> - xprt->ops->connect(xprt, task); >>> + /* Race breaker */ >>> + if (!xprt_connected(xprt)) { >>> + xprt->stat.connect_start = jiffies; >>> + xprt->ops->connect(xprt, task); >>> + } else { >>> + xprt_clear_connecting(xprt); >>> + task->tk_status = 0; >>> + rpc_wake_up_queued_task(&xprt->pending, task); >>> + } >>> } >>> xprt_release_write(xprt, task); >>> } >>> -- >>> 2.19.2 >>> >> >> -- >> Chuck Lever >> >> >> > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever