Hi Chuck, On 08/06/2018 10:00 AM, Chuck Lever wrote: > > >> On Jul 28, 2018, at 10:46 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >> >> I found that injecting disconnects with v4.18-rc resulted in >> random failures of the multi-threaded git regression test. >> >> The root cause appears to be that, after a reconnect, the >> RPC/RDMA transport is waking pending RPCs before the transport has >> posted enough Receive buffers to receive the Replies. If a Reply >> arrives before enough Receive buffers are posted, the connection >> is dropped. A few connection drops happen in quick succession as >> the client and server struggle to regain credit synchronization. >> >> This regression was introduced with commit 7c8d9e7c8863 ("xprtrdma: >> Move Receive posting to Receive handler"). The client is supposed to >> post a single Receive when a connection is established because >> it's not supposed to send more than one RPC Call before it gets >> a fresh credit grant in the first RPC Reply [RFC 8166, Section >> 3.3.3]. >> >> Unfortunately there appears to be a longstanding bug in the Linux >> client's credit accounting mechanism. On connect, it simply dumps >> all pending RPC Calls onto the new connection. It's possible it has >> done this ever since the RPC/RDMA transport was added to the kernel >> ten years ago. >> >> Servers have so far been tolerant of this bad behavior. Currently no >> server implementation ever changes its credit grant over reconnects, >> and servers always repost enough Receives before connections are >> fully established. >> >> The Linux client implementation used to post a Receive before each >> of these Calls. This has covered up the flooding send behavior. >> >> I could try to correct this old bug so that the client sends exactly >> one RPC Call and waits for a Reply. Since we are so close to the >> next merge window, I'm going to instead provide a simple patch to >> post enough Receives before a reconnect completes (based on the >> number of credits granted to the previous connection). >> >> The spurious disconnects will be gone, but the client will still >> send multiple RPC Calls immediately after a reconnect. >> >> Addressing the latter problem will wait for a merge window because >> a) I expect it to be a large change requiring lots of testing, and >> b) obviously the Linux client has interoperated successfully since >> day zero while still being broken. >> >> Fixes: 7c8d9e7c8863 ("xprtrdma: Move Receive posting to ... ") >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> --- >> net/sunrpc/xprtrdma/verbs.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> Hi Anna- >> >> Would it be possible to get this into linux-next and then v4.18-rc ? >> I know this is very late, but I found this issue only recently and >> it took a few days to nail down a simple fix. > > Hi Anna, what's the status of this fix? I have it queued up for 4.19 right now, with a cc:stable so it'll be backported once it's added. Sorry that I didn't see your note about getting it into v4.18 earlier :( Anna > > >> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c >> index 16161a3..e8d1024 100644 >> --- a/net/sunrpc/xprtrdma/verbs.c >> +++ b/net/sunrpc/xprtrdma/verbs.c >> @@ -280,7 +280,6 @@ >> ++xprt->rx_xprt.connect_cookie; >> connstate = -ECONNABORTED; >> connected: >> - xprt->rx_buf.rb_credits = 1; >> ep->rep_connected = connstate; >> rpcrdma_conn_func(ep); >> wake_up_all(&ep->rep_connect_wait); >> @@ -755,6 +754,7 @@ >> } >> >> ep->rep_connected = 0; >> + rpcrdma_post_recvs(r_xprt, true); >> >> rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma); >> if (rc) { >> @@ -773,8 +773,6 @@ >> >> dprintk("RPC: %s: connected\n", __func__); >> >> - rpcrdma_post_recvs(r_xprt, true); >> - >> out: >> if (rc) >> ep->rep_connected = rc; >> @@ -1171,6 +1169,7 @@ struct rpcrdma_req * >> list_add(&req->rl_list, &buf->rb_send_bufs); >> } >> >> + buf->rb_credits = 1; >> buf->rb_posted_receives = 0; >> INIT_LIST_HEAD(&buf->rb_recv_bufs); >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html