> On Dec 3, 2018, at 12:15 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Mon, 2018-12-03 at 10:56 -0500, Chuck Lever wrote: >>> On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@xxxxxxxxx> >>> wrote: >>> >>> On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote: >>>> Test report. >>>> >>>> I applied this on top of the three v4 patches from Friday. >>>> >>>> With TCP, I no longer see the soft IRQ warnings or bdev leaks. >>>> However, at some point during my test (sec=krb5p,vers=3) the >>>> mount stops responding and the client becomes idle. ^C leaves >>>> the iozone processes in D state. >>> >>> Does /proc/*/stack show any tasks with any interesting stack >>> behaviour? >>> >>>> With RDMA, I can see a few issues: >>>> - A lot of -74 and server disconnects. I suspect this is due >>>> to GSS sequence window overruns >>>> - xprt_connect puts the calling task to sleep if the credit >>>> limit has been reached, and that can result in a deadlock >>>> - The reconnect logic appears to busy-wait until something >>>> else (eg, an RTO) forces a reconnect >>>> - A connect or disconnect wake-up that occurs while an RPC >>>> reply is being handled (ie, is still on xprt->pending) >>>> results in that RPC being retried unnecessarily. >>>> >>>> I'm not sure how to address that last one. Sometimes RDMA has >>>> to invalidate MRs, and that involves a wait/context swith which >>>> opens the race window significantly. >>> >>> In principle the latter issue is supposed to be handled by the >>> connect >>> cookies. Are the updates perhaps being ordered incorrectly w.r.t. >>> the >>> xprt->pending wakeup? If so, then that could cause races. >>> >>> I'd expect the cookie would need to be updated before the call to >>> xprt- >>> pending both on connect and disconnect. >> >> I've found one peculiar cookie-related behavior. >> >> At some point, xprt_request_retransmit_after_disconnect() starts >> to claim that all requests have to be retransmitted and that the >> connection is closed. Closer inspection shows that the cookie >> check here is failing because it seems to be testing a request >> that hasn't been transmitted yet. > > Right, but as far as the transport layer is concerned, there should be > no difference between 'need transmit' and 'need retransmit'. It is up > to the higher RPC client layer to decide whether or not there is a > policy difference in behaviour, and AFAICS rpc_task_need_encode() > should be doing the right thing. OK, thanks for clearing that up. A couple of observations: - The sleep followed immediately by a wake-up tells me you don't expect this to happen frequently. But with sec=krb5 flavors, I do see this path taken quite a bit. Sounds like I need to figure out why that is. - returning EAGAIN would make a whole lot more sense in these cases since the connection is frequently still whole. >>>>> On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@xxxxxxxxx> >>>>> wrote: >>>>> >>>>> If an asynchronous connection attempt completes while another >>>>> task >>>>> is >>>>> in xprt_connect(), then the call to rpc_sleep_on() could end up >>>>> racing with the call to xprt_wake_pending_tasks(). >>>>> So add a second test of the connection state after we've put >>>>> the >>>>> task to sleep and set the XPRT_CONNECTING flag, when we know >>>>> that >>>>> there >>>>> can be no asynchronous connection attempts still in progress. >>>>> >>>>> Fixes: 0b9e79431377d ("SUNRPC: Move the test for >>>>> XPRT_CONNECTING >>>>> into...") >>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx >>>>>> >>>>> --- >>>>> net/sunrpc/xprt.c | 11 +++++++++-- >>>>> 1 file changed, 9 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>>> index 122c91c28e7c..ce927002862a 100644 >>>>> --- a/net/sunrpc/xprt.c >>>>> +++ b/net/sunrpc/xprt.c >>>>> @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task) >>>>> return; >>>>> if (xprt_test_and_set_connecting(xprt)) >>>>> return; >>>>> - xprt->stat.connect_start = jiffies; >>>>> - xprt->ops->connect(xprt, task); >>>>> + /* Race breaker */ >>>>> + if (!xprt_connected(xprt)) { >>>>> + xprt->stat.connect_start = jiffies; >>>>> + xprt->ops->connect(xprt, task); >>>>> + } else { >>>>> + xprt_clear_connecting(xprt); >>>>> + task->tk_status = 0; >>>>> + rpc_wake_up_queued_task(&xprt->pending, >>>>> task); >>>>> + } >>>>> } >>>>> xprt_release_write(xprt, task); >>>>> } >>>>> -- >>>>> 2.19.2 >>>>> > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx -- Chuck Lever