Howdy. I've had reports (and personal experience) where the Linux NFS/RDMA client waits for a very long time after a disruption of the network or NFS server. There is a disconnect time wait in the Connection Manager which blocks the RPC/RDMA transport from tearing down a connection for a few minutes when the remote cannot respond to DREQ messages. An RPC/RDMA transport has only one slot for connection state, so the transport is prevented from establishing a fresh connection until the time wait completes. This patch series refactors the connection end point data structures to enable one active and multiple zombie connections. Now, while a defunct connection is waiting to die, it is separated from the transport, clearing the way for the immediate creation of a new connection. Clean-up of the old connection's data structures and resources then completes in the background. Well, that's the idea, anyway. Review and comments welcome. Hoping this can be merged in v5.7. --- Chuck Lever (11): xprtrdma: Invoke rpcrdma_ep_create() in the connect worker xprtrdma: Refactor frwr_init_mr() xprtrdma: Clean up the post_send path xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect() xprtrdma: Allocate Protection Domain in rpcrdma_ep_create() xprtrdma: Invoke rpcrdma_ia_open in the connect worker xprtrdma: Remove rpcrdma_ia::ri_flags xprtrdma: Disconnect on flushed completion xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep xprtrdma: Extract sockaddr from struct rdma_cm_id xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt include/trace/events/rpcrdma.h | 97 ++--- net/sunrpc/xprtrdma/backchannel.c | 8 net/sunrpc/xprtrdma/frwr_ops.c | 152 ++++---- net/sunrpc/xprtrdma/rpc_rdma.c | 32 +- net/sunrpc/xprtrdma/transport.c | 72 +--- net/sunrpc/xprtrdma/verbs.c | 681 ++++++++++++++----------------------- net/sunrpc/xprtrdma/xprt_rdma.h | 89 ++--- 7 files changed, 445 insertions(+), 686 deletions(-) -- Chuck Lever