On Thu, 2017-02-09 at 14:42 -0500, Chuck Lever wrote: > > On Feb 9, 2017, at 10:37 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> > > wrote: > > > > > > > > On Feb 8, 2017, at 7:48 PM, Trond Myklebust <trondmy@primarydata. > > > com> wrote: > > > > > > On Wed, 2017-02-08 at 19:19 -0500, Chuck Lever wrote: > > > > > On Feb 8, 2017, at 7:05 PM, Trond Myklebust <trondmy@primaryd > > > > > ata.co > > > > > m> wrote: > > > > > > > > > > On Wed, 2017-02-08 at 17:01 -0500, Chuck Lever wrote: > > > > > > Allow RPC-over-RDMA to send NULL pings even when the > > > > > > transport > > > > > > has > > > > > > hit its credit limit. One RPC-over-RDMA credit is reserved > > > > > > for > > > > > > operations like keepalive. > > > > > > > > > > > > For transports that convey NFSv4, it seems like lease > > > > > > renewal > > > > > > would > > > > > > also be a candidate for using a priority transport slot. > > > > > > I'd like > > > > > > to > > > > > > see a mechanism better than RPCRDMA_PRIORITY that can > > > > > > ensure only > > > > > > one priority operation is in use at a time. > > > > > > > > > > > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > > > --- > > > > > > include/linux/sunrpc/sched.h | 2 ++ > > > > > > net/sunrpc/xprt.c | 4 ++++ > > > > > > net/sunrpc/xprtrdma/transport.c | 3 ++- > > > > > > net/sunrpc/xprtrdma/verbs.c | 13 ++++++++----- > > > > > > 4 files changed, 16 insertions(+), 6 deletions(-) > > > > > > > > > > > > diff --git a/include/linux/sunrpc/sched.h > > > > > > b/include/linux/sunrpc/sched.h > > > > > > index 13822e6..fcea158 100644 > > > > > > --- a/include/linux/sunrpc/sched.h > > > > > > +++ b/include/linux/sunrpc/sched.h > > > > > > @@ -127,6 +127,7 @@ struct rpc_task_setup { > > > > > > #define RPC_TASK_TIMEOUT 0x1000 /* > > > > > > fail > > > > > > with > > > > > > ETIMEDOUT on timeout */ > > > > > > #define RPC_TASK_NOCONNECT 0x2000 /* > > > > > > return > > > > > > ENOTCONN if not connected */ > > > > > > #define RPC_TASK_NO_RETRANS_TIMEOUT 0x4000 > > > > > > /* > > > > > > wait forever for a reply */ > > > > > > +#define RPC_TASK_NO_CONG 0x8000 /* > > > > > > skip > > > > > > congestion control */ > > > > > > > > > > > > #define RPC_TASK_SOFTPING (RPC_TASK_SOFT | > > > > > > RPC_TASK_SOFTCONN) > > > > > > > > > > > > @@ -137,6 +138,7 @@ struct rpc_task_setup { > > > > > > #define RPC_IS_SOFT(t) ((t)->tk_flags & > > > > > > (RPC_TASK_SOFT|RPC_TASK_TIMEOUT)) > > > > > > #define RPC_IS_SOFTCONN(t) ((t)->tk_flags & > > > > > > RPC_TASK_SOFTCONN) > > > > > > #define RPC_WAS_SENT(t) ((t)->tk_flags & > > > > > > RPC_TASK_SENT) > > > > > > +#define RPC_SKIP_CONG(t) ((t)->tk_flags & > > > > > > RPC_TASK_NO_CONG) > > > > > > > > > > > > #define RPC_TASK_RUNNING 0 > > > > > > #define RPC_TASK_QUEUED 1 > > > > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > > > > > index b530a28..a477ee6 100644 > > > > > > --- a/net/sunrpc/xprt.c > > > > > > +++ b/net/sunrpc/xprt.c > > > > > > @@ -392,6 +392,10 @@ static inline void > > > > > > xprt_release_write(struct > > > > > > rpc_xprt *xprt, struct rpc_task *ta > > > > > > { > > > > > > struct rpc_rqst *req = task->tk_rqstp; > > > > > > > > > > > > + if (RPC_SKIP_CONG(task)) { > > > > > > + req->rq_cong = 0; > > > > > > + return 1; > > > > > > + } > > > > > > > > > > Why not just have the RDMA layer call xprt_reserve_xprt() > > > > > (and > > > > > xprt_release_xprt()) if this flag is set? It seems to me that > > > > > you > > > > > will > > > > > need some kind of extra congestion control in the RDMA layer > > > > > anyway > > > > > since you only have one reserved credit for these privileged > > > > > tasks > > > > > (or > > > > > did I miss where that is being gated?). > > > > > > > > Thanks for the review. > > > > > > > > See RPCRDMA_IA_RSVD_CREDIT in 11/12. It's a hack I'm not > > > > terribly happy with. > > > > > > > > So, I think you are suggesting replacing xprtrdma's > > > > ->reserve_xprt with something like: > > > > > > > > int xprt_rdma_reserve_xprt(xprt, task) > > > > { > > > > if (RPC_SKIP_CONG(task)) > > > > return xprt_reserve_xprt(xprt, task); > > > > return xprt_reserve_xprt_cong(xprt, task); > > > > } > > > > > > > > and likewise for ->release_xprt ? > > > > > > Right. > > This seems to work fine for the normal cases. > > I'm confused about how to construct xprt_rdma_release_xprt() > so it never releases a normal RPC task when a SKIP_CONG > task completes and the credit limit is still full. > > If it should send a normal task using the reserved credit > and that task hangs too, we're in exactly the position > we wanted to avoid. > > My original solution might have had a similar problem, > come to think of it. > > That's true... You may need to set up a separate waitqueue that is reserved for SKIP_CONG tasks. Again, it makes sense to keep that in the RDMA code. -- Trond Myklebust Principal System Architect 4300 El Camino Real | Suite 100 Los Altos, CA 94022 W: 650-422-3800 C: 801-921-4583 www.primarydata.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥