On Mon, 2018-12-17 at 13:37 -0500, Chuck Lever wrote: > > On Dec 17, 2018, at 12:28 PM, Trond Myklebust < > > trondmy@xxxxxxxxxxxxxxx> wrote: > > > > On Mon, 2018-12-17 at 11:39 -0500, Chuck Lever wrote: > > > Transport disconnect processing does a "wake pending tasks" at > > > various points. > > > > > > Suppose an RPC Reply is being processed. The RPC task that Reply > > > goes with is waiting on the pending queue. If a disconnect wake- > > > up > > > happens before reply processing is done, that reply, even if it > > > is > > > good, is thrown away, and the RPC has to be sent again. > > > > > > This window apparently does not exist for socket transports > > > because > > > there is a lock held while a reply is being received which > > > prevents > > > the wake-up call until after reply processing is done. > > > > > > To resolve this, all RPC replies being processed on an RPC-over- > > > RDMA > > > transport have to complete before pending tasks are awoken due to > > > a > > > transport disconnect. > > > > > > Callers that already hold the transport write lock may invoke > > > ->ops->close directly. Others use a generic helper that schedules > > > a close when the write lock can be taken safely. > > > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > --- > > > include/linux/sunrpc/xprt.h | 1 + > > > net/sunrpc/xprt.c | 19 > > > +++++++++++++++++++ > > > net/sunrpc/xprtrdma/backchannel.c | 13 +++++++------ > > > net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 8 +++++--- > > > net/sunrpc/xprtrdma/transport.c | 16 ++++++++++----- > > > - > > > net/sunrpc/xprtrdma/verbs.c | 5 ++--- > > > 6 files changed, 44 insertions(+), 18 deletions(-) > > > > > > diff --git a/include/linux/sunrpc/xprt.h > > > b/include/linux/sunrpc/xprt.h > > > index a4ab4f8..ee94ed0 100644 > > > --- a/include/linux/sunrpc/xprt.h > > > +++ b/include/linux/sunrpc/xprt.h > > > @@ -401,6 +401,7 @@ static inline __be32 > > > *xprt_skip_transport_header(struct rpc_xprt *xprt, __be32 * > > > bool xprt_request_get_cong(struct rpc_xprt > > > *xprt, > > > struct rpc_rqst *req); > > > void xprt_disconnect_done(struct rpc_xprt > > > *xprt); > > > void xprt_force_disconnect(struct rpc_xprt > > > *xprt); > > > +void xprt_disconnect_nowake(struct rpc_xprt > > > *xprt); > > > void xprt_conditional_disconnect(struct > > > rpc_xprt > > > *xprt, unsigned int cookie); > > > > > > bool xprt_lock_connect(struct rpc_xprt *, > > > struct > > > rpc_task *, void *); > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > > index ce92700..afe412e 100644 > > > --- a/net/sunrpc/xprt.c > > > +++ b/net/sunrpc/xprt.c > > > @@ -685,6 +685,25 @@ void xprt_force_disconnect(struct rpc_xprt > > > *xprt) > > > } > > > EXPORT_SYMBOL_GPL(xprt_force_disconnect); > > > > > > +/** > > > + * xprt_disconnect_nowake - force a call to xprt->ops->close > > > + * @xprt: transport to disconnect > > > + * > > > + * The caller must ensure that xprt_wake_pending_tasks() is > > > + * called later. > > > + */ > > > +void xprt_disconnect_nowake(struct rpc_xprt *xprt) > > > +{ > > > + /* Don't race with the test_bit() in xprt_clear_locked() > > > */ > > > + spin_lock_bh(&xprt->transport_lock); > > > + set_bit(XPRT_CLOSE_WAIT, &xprt->state); > > > + /* Try to schedule an autoclose RPC call */ > > > + if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) > > > + queue_work(xprtiod_workqueue, &xprt- > > > >task_cleanup); > > > + spin_unlock_bh(&xprt->transport_lock); > > > +} > > > +EXPORT_SYMBOL_GPL(xprt_disconnect_nowake); > > > + > > > > We shouldn't need both xprt_disconnect_nowake() and > > xprt_force_disconnect() to be exported given that you can build the > > latter from the former + xprt_wake_pending_tasks() (which is also > > already exported). > > Thanks for your review! > > I can get rid of xprt_disconnect_nowake. There are some variations, > depending on why wake_pending_tasks is protected by xprt- > >transport_lock. I'm having some second thoughts about the patch that Scott sent out last week to fix the issue that Dave and he were seeing. I think that what we really need to do to fix his issue is to call xprt_disconnect_done() after we've released the TCP socket. Given that realisation, I think that we can fix up xprt_force_disconnect() to only wake up the task that holds the XPRT_LOCKED instead of doing a thundering herd wakeup like we do today. That should (I think) obviate the need for a separate xprt_disconnect_nowake(). A patch is forthcoming later today. I'll make sure you are Cced so you can comment. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx