This is a series of fixes and architectural changes that should improve robustness and result in better scalability of NFS/RDMA. I'm sure one or two of these could be broken down a little more, comments welcome. The fundamental observation is that the RPC work queues are BOUND, thus rescheduling work in the Receive completion handler to one of these work queues just forces it to run later on the same CPU. So try to do more work right in the Receive completion handler to reduce context switch overhead. A secondary concern is that the average amount of wall-clock time it takes to handle a single Receive completion caps the IOPS rate (both per-xprt and per-NIC). In this patch series I've taken a few steps to reduce that latency, and I'm looking into a few others. This series can be fetched from: git://git.linux-nfs.org/projects/cel/cel-2.6.git in topic branch "nfs-for-5.3". --- Chuck Lever (12): xprtrdma: Fix use-after-free in rpcrdma_post_recvs xprtrdma: Replace use of xdr_stream_pos in rpcrdma_marshal_req xprtrdma: Fix occasional transport deadlock xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flag xprtrdma: Remove fr_state xprtrdma: Add mechanism to place MRs back on the free list xprtrdma: Reduce context switching due to Local Invalidation xprtrdma: Wake RPCs directly in rpcrdma_wc_send path xprtrdma: Simplify rpcrdma_rep_create xprtrdma: Streamline rpcrdma_post_recvs xprtrdma: Refactor chunk encoding xprtrdma: Remove rpcrdma_req::rl_buffer include/trace/events/rpcrdma.h | 47 ++++-- net/sunrpc/xprtrdma/frwr_ops.c | 330 ++++++++++++++++++++++++++------------- net/sunrpc/xprtrdma/rpc_rdma.c | 146 +++++++---------- net/sunrpc/xprtrdma/transport.c | 16 +- net/sunrpc/xprtrdma/verbs.c | 115 ++++++-------- net/sunrpc/xprtrdma/xprt_rdma.h | 43 +---- 6 files changed, 384 insertions(+), 313 deletions(-) -- Chuck Lever