> On Dec 14, 2017, at 4:33 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2017-12-14 at 15:59 -0500, Chuck Lever wrote: >>> On Dec 14, 2017, at 3:37 PM, Trond Myklebust <trondmy@primarydata.c >>> om> wrote: >>> >>> On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote: >>>>> On Dec 14, 2017, at 2:03 PM, Trond Myklebust <trondmy@primaryda >>>>> ta.c >>>>> om> wrote: >>>>> >>>>> Does the RDMA code update the connect cookie when the >>>>> connection >>>>> breaks? It looks to me as if it only does that when the >>>>> connection >>>>> is >>>>> re-established. We really want both. >>>>> >>>>>> I imagine this issue could also impact write buffer >>>>>> exhaustion >>>>>> on TCP. >>>>> >>>>> See net/sunrpc/xprtsock.c:xs_tcp_state_change() >>>> >>>> xprtrdma manipulates the connect_cookie in its connect worker, >>>> see rpcrdma_connect_worker. This was added by: >>>> >>>> commit 575448bd36208f99fe0dd554a43518d798966740 >>>> Author: Tom Talpey <talpey@xxxxxxxxxx> >>>> AuthorDate: Thu Oct 9 15:00:40 2008 -0400 >>>> Commit: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> >>>> CommitDate: Fri Oct 10 15:10:36 2008 -0400 >>>> >>>> RPC/RDMA: suppress retransmit on RPC/RDMA clients. >>>> >>>> Would it be more correct to bump the cookie in >>>> rpcrdma_conn_upcall, >>>> which is the equivalent to xs_tcp_state_change? (if so, why, so >>>> I can compose a reasonable patch description) >>>> >>>> It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the >>>> RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure >>>> RDMA provides a distinction between "server disconnected" >>>> and "client disconnected" although that probably does not >>>> matter for this purpose. >>>> >>>> But, why would the additional cookie update help? The transport >>>> is not disconnecting before the deadlock. >>>> >>> >>> The connection cookie's purpose is twofold: >>> >>> 1) It tracks whether or not a request has been transmitted on the >>> current connection or not. >> >> That's broken by setting the cookie unconditionally outside >> the transport_lock, isn't it? >> >> >>> 2) It ensures that when several requests with the same connection >>> cookie all call xprt_conditional_disconnect(), then that results in >>> a >>> single disconnection event. To do so, it assumes that >>> xprt_autoclose() >>> will change the cookie if the disconnection attempt is successful. >>> >>> In TCP we do so in the xs_tcp_state_change(). If the RDMA transport >>> can >>> guarantee that the call to xprt->ops->close(xprt) is always >>> successful, >>> then you could do so there. >> >> I don't mind moving the cookie bump to rpcrdma_conn_upcall, >> but I'm not sure I understand the locking requirements. >> >> Currently, xprt_transmit sets the connect_cookie while holding >> the transport_lock. >> >> xprt_conditional_disconnect compares the cookie while holding >> the transport_lock. >> >> For TCP, the transport_lock is held when bumping the cookie >> in the ESTABLISHED case, but _not_ in the two CLOSE cases? > > That should be OK. The networking layer should provide sufficient > serialisation that we don't have to worry about collisions. > >> >> xprtrdma holds the transport_lock when bumping the cookie, >> which it does in its connect worker. It has to hold the lock >> because it skips the value 0. xprtrdma needs to guarantee >> that an RPC is never transmitted on the same connection >> twice (and maybe it could use rq_connect_cookie instead of >> its own cookie). >> >> xprt_reserve_init is holding the reserve_lock but not the >> transport_lock when it grabs the cookie. Maybe it should >> not be initializing the rqst's cookie there? >> >> Seems to me that xprt_transmit needs to update the rqst's >> cookie while holding the transport_lock, especially if >> xprtrdma needs to skip a cookie value? I'm sure I'm missing >> something. >> > > It should be OK, given that the connection is a state machine. > However, I missed something that you said earlier about > xprt_prepare_transmit(). > > OK. How about the following fixup patch instead of the earlier one? > > 8<--------------------------------------------------- > From 21cdb2802d9d8b71553998e6be5aafeff0742142 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > Date: Thu, 14 Dec 2017 07:05:27 -0500 > Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path > > --- > net/sunrpc/xprt.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 5e4278e9ce37..33b74fd84051 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task) > { > struct rpc_rqst *req = task->tk_rqstp; > struct rpc_xprt *xprt = req->rq_xprt; > + unsigned int connect_cookie; > int status, numreqs; > > dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); > @@ -1024,7 +1025,7 @@ void xprt_transmit(struct rpc_task *task) > } else if (!req->rq_bytes_sent) > return; > > - req->rq_connect_cookie = xprt->connect_cookie; > + connect_cookie = xprt->connect_cookie; > req->rq_xtime = ktime_get(); > status = xprt->ops->send_request(task); > trace_xprt_transmit(xprt, req->rq_xid, status); > @@ -1050,6 +1051,7 @@ void xprt_transmit(struct rpc_task *task) > xprt->stat.pending_u += xprt->pending.qlen; > spin_unlock_bh(&xprt->transport_lock); > > + req->rq_connect_cookie = connect_cookie; > if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { > /* > * Sleep on the pending queue if we're expecting a reply. > -- > 2.14.3 No problems here, passed basic testing with NFSv4.0 on a client with extra send_request fault injection. I hope we can get the recv race fix (as updated here) and the queue-work-on patch [1] into v4.15-rc. -- Chuck Lever [1] https://marc.info/?l=linux-nfs&m=151241427912572&w=2-- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html