> On Sep 17, 2018, at 9:03 AM, Trond Myklebust <trondmy@xxxxxxxxx> wrote: > > One of the intentions with the priority queues was to ensure that no > single process can hog the transport. The field task->tk_owner therefore > identifies the RPC call's origin, and is intended to allow the RPC layer > to organise queues for fairness. > This commit therefore modifies the transmit queue to group requests > by task->tk_owner, and ensures that we round robin among those groups. > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > --- > include/linux/sunrpc/xprt.h | 1 + > net/sunrpc/xprt.c | 27 ++++++++++++++++++++++++--- > 2 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h > index 8c2bb078f00c..e377620b9744 100644 > --- a/include/linux/sunrpc/xprt.h > +++ b/include/linux/sunrpc/xprt.h > @@ -89,6 +89,7 @@ struct rpc_rqst { > }; > > struct list_head rq_xmit; /* Send queue */ > + struct list_head rq_xmit2; /* Send queue */ > > void *rq_buffer; /* Call XDR encode buffer */ > size_t rq_callsize; > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 35f5df367591..3e68f35f71f6 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1052,12 +1052,21 @@ xprt_request_need_enqueue_transmit(struct rpc_task *task, struct rpc_rqst *req) > void > xprt_request_enqueue_transmit(struct rpc_task *task) > { > - struct rpc_rqst *req = task->tk_rqstp; > + struct rpc_rqst *pos, *req = task->tk_rqstp; > struct rpc_xprt *xprt = req->rq_xprt; > > if (xprt_request_need_enqueue_transmit(task, req)) { > spin_lock(&xprt->queue_lock); > + list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { > + if (pos->rq_task->tk_owner != task->tk_owner) > + continue; > + list_add_tail(&req->rq_xmit2, &pos->rq_xmit2); > + INIT_LIST_HEAD(&req->rq_xmit); > + goto out; > + } > list_add_tail(&req->rq_xmit, &xprt->xmit_queue); > + INIT_LIST_HEAD(&req->rq_xmit2); > +out: > set_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate); > spin_unlock(&xprt->queue_lock); > } > @@ -1073,8 +1082,20 @@ xprt_request_enqueue_transmit(struct rpc_task *task) > static void > xprt_request_dequeue_transmit_locked(struct rpc_task *task) > { > - if (test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) > - list_del(&task->tk_rqstp->rq_xmit); > + struct rpc_rqst *req = task->tk_rqstp; > + > + if (!test_and_clear_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) > + return; > + if (!list_empty(&req->rq_xmit)) { > + list_del(&req->rq_xmit); > + if (!list_empty(&req->rq_xmit2)) { > + struct rpc_rqst *next = list_first_entry(&req->rq_xmit2, > + struct rpc_rqst, rq_xmit2); > + list_del(&req->rq_xmit2); > + list_add_tail(&next->rq_xmit, &next->rq_xprt->xmit_queue); > + } > + } else > + list_del(&req->rq_xmit2); > } > > /** > -- > 2.17.1 Hi Trond- I've chased down a couple of remaining regressions with the v4.20 NFS client, and they seem to be rooted in this commit. When using sec=krb5, krb5i, or krb5p I found that multi-threaded workloads trigger a lot of server-side disconnects. This is with TCP and RDMA transports. An instrumented server shows that the client is under-running the GSS sequence number window. I monitored the order in which GSS sequence numbers appear on the wire, and after this commit, the sequence numbers are wildly misordered. If I revert the hunk in xprt_request_enqueue_transmit, the problem goes away. I also found that reverting that hunk results in a 3-4% improvement in fio IOPS rates, as well as improvement in average and maximum latency as reported by fio. -- Chuck Lever