On Thu, 2017-12-14 at 15:59 -0500, Chuck Lever wrote: > > On Dec 14, 2017, at 3:37 PM, Trond Myklebust <trondmy@primarydata.c > > om> wrote: > > > > On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote: > > > > On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primaryda > > > > ta.c > > > > om> wrote: > > > > > > > > Does the RDMA code update the connect cookie when the > > > > connection > > > > breaks? It looks to me as if it only does that when the > > > > connection > > > > is > > > > re-established. We really want both. > > > > > > > > > I imagine this issue could also impact write buffer > > > > > exhaustion > > > > > on TCP. > > > > > > > > See net/sunrpc/xprtsock.c:xs_tcp_state_change() > > > > > > xprtrdma manipulates the connect_cookie in its connect worker, > > > see rpcrdma_connect_worker. This was added by: > > > > > > commit 575448bd36208f99fe0dd554a43518d798966740 > > > Author: Tom Talpey <talpey@xxxxxxxxxx> > > > AuthorDate: Thu Oct 9 15:00:40 2008 -0400 > > > Commit: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > > > CommitDate: Fri Oct 10 15:10:36 2008 -0400 > > > > > > RPC/RDMA: suppress retransmit on RPC/RDMA clients. > > > > > > Would it be more correct to bump the cookie in > > > rpcrdma_conn_upcall, > > > which is the equivalent to xs_tcp_state_change? (if so, why, so > > > I can compose a reasonable patch description) > > > > > > It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the > > > RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure > > > RDMA provides a distinction between "server disconnected" > > > and "client disconnected" although that probably does not > > > matter for this purpose. > > > > > > But, why would the additional cookie update help? The transport > > > is not disconnecting before the deadlock. > > > > > > > The connection cookie's purpose is twofold: > > > > 1) It tracks whether or not a request has been transmitted on the > > current connection or not. > > That's broken by setting the cookie unconditionally outside > the transport_lock, isn't it? > > > > 2) It ensures that when several requests with the same connection > > cookie all call xprt_conditional_disconnect(), then that results in > > a > > single disconnection event. To do so, it assumes that > > xprt_autoclose() > > will change the cookie if the disconnection attempt is successful. > > > > In TCP we do so in the xs_tcp_state_change(). If the RDMA transport > > can > > guarantee that the call to xprt->ops->close(xprt) is always > > successful, > > then you could do so there. > > I don't mind moving the cookie bump to rpcrdma_conn_upcall, > but I'm not sure I understand the locking requirements. > > Currently, xprt_transmit sets the connect_cookie while holding > the transport_lock. > > xprt_conditional_disconnect compares the cookie while holding > the transport_lock. > > For TCP, the transport_lock is held when bumping the cookie > in the ESTABLISHED case, but _not_ in the two CLOSE cases? That should be OK. The networking layer should provide sufficient serialisation that we don't have to worry about collisions. > > xprtrdma holds the transport_lock when bumping the cookie, > which it does in its connect worker. It has to hold the lock > because it skips the value 0. xprtrdma needs to guarantee > that an RPC is never transmitted on the same connection > twice (and maybe it could use rq_connect_cookie instead of > its own cookie). > > xprt_reserve_init is holding the reserve_lock but not the > transport_lock when it grabs the cookie. Maybe it should > not be initializing the rqst's cookie there? > > Seems to me that xprt_transmit needs to update the rqst's > cookie while holding the transport_lock, especially if > xprtrdma needs to skip a cookie value? I'm sure I'm missing > something. > It should be OK, given that the connection is a state machine. However, I missed something that you said earlier about xprt_prepare_transmit(). OK. How about the following fixup patch instead of the earlier one? 8<--------------------------------------------------- >From 21cdb2802d9d8b71553998e6be5aafeff0742142 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Thu, 14 Dec 2017 07:05:27 -0500 Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path --- net/sunrpc/xprt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 5e4278e9ce37..33b74fd84051 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; + unsigned int connect_cookie; int status, numreqs; dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); @@ -1024,7 +1025,7 @@ void xprt_transmit(struct rpc_task *task) } else if (!req->rq_bytes_sent) return; - req->rq_connect_cookie = xprt->connect_cookie; + connect_cookie = xprt->connect_cookie; req->rq_xtime = ktime_get(); status = xprt->ops->send_request(task); trace_xprt_transmit(xprt, req->rq_xid, status); @@ -1050,6 +1051,7 @@ void xprt_transmit(struct rpc_task *task) xprt->stat.pending_u += xprt->pending.qlen; spin_unlock_bh(&xprt->transport_lock); + req->rq_connect_cookie = connect_cookie; if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { /* * Sleep on the pending queue if we're expecting a reply. -- 2.14.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥