Test report. I applied this on top of the three v4 patches from Friday. With TCP, I no longer see the soft IRQ warnings or bdev leaks. However, at some point during my test (sec=krb5p,vers=3) the mount stops responding and the client becomes idle. ^C leaves the iozone processes in D state. With RDMA, I can see a few issues: - A lot of -74 and server disconnects. I suspect this is due to GSS sequence window overruns - xprt_connect puts the calling task to sleep if the credit limit has been reached, and that can result in a deadlock - The reconnect logic appears to busy-wait until something else (eg, an RTO) forces a reconnect - A connect or disconnect wake-up that occurs while an RPC reply is being handled (ie, is still on xprt->pending) results in that RPC being retried unnecessarily. I'm not sure how to address that last one. Sometimes RDMA has to invalidate MRs, and that involves a wait/context swith which opens the race window significantly. > On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@xxxxxxxxx> wrote: > > If an asynchronous connection attempt completes while another task is > in xprt_connect(), then the call to rpc_sleep_on() could end up > racing with the call to xprt_wake_pending_tasks(). > So add a second test of the connection state after we've put the > task to sleep and set the XPRT_CONNECTING flag, when we know that there > can be no asynchronous connection attempts still in progress. > > Fixes: 0b9e79431377d ("SUNRPC: Move the test for XPRT_CONNECTING into...") > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > --- > net/sunrpc/xprt.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 122c91c28e7c..ce927002862a 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task) > return; > if (xprt_test_and_set_connecting(xprt)) > return; > - xprt->stat.connect_start = jiffies; > - xprt->ops->connect(xprt, task); > + /* Race breaker */ > + if (!xprt_connected(xprt)) { > + xprt->stat.connect_start = jiffies; > + xprt->ops->connect(xprt, task); > + } else { > + xprt_clear_connecting(xprt); > + task->tk_status = 0; > + rpc_wake_up_queued_task(&xprt->pending, task); > + } > } > xprt_release_write(xprt, task); > } > -- > 2.19.2 > -- Chuck Lever