> On Feb 9, 2017, at 3:13 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Thu, 2017-02-09 at 14:42 -0500, Chuck Lever wrote: >>> On Feb 9, 2017, at 10:37 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> >>> wrote: >>> >>>> >>>> On Feb 8, 2017, at 7:48 PM, Trond Myklebust <trondmy@primarydata. >>>> com> wrote: >>>> >>>> On Wed, 2017-02-08 at 19:19 -0500, Chuck Lever wrote: >>>>>> On Feb 8, 2017, at 7:05 PM, Trond Myklebust <trondmy@primaryd >>>>>> ata.co >>>>>> m> wrote: >>>>>> >>>>>> On Wed, 2017-02-08 at 17:01 -0500, Chuck Lever wrote: >>>>>>> Allow RPC-over-RDMA to send NULL pings even when the >>>>>>> transport >>>>>>> has >>>>>>> hit its credit limit. One RPC-over-RDMA credit is reserved >>>>>>> for >>>>>>> operations like keepalive. >>>>>>> >>>>>>> For transports that convey NFSv4, it seems like lease >>>>>>> renewal >>>>>>> would >>>>>>> also be a candidate for using a priority transport slot. >>>>>>> I'd like >>>>>>> to >>>>>>> see a mechanism better than RPCRDMA_PRIORITY that can >>>>>>> ensure only >>>>>>> one priority operation is in use at a time. >>>>>>> >>>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>>> --- >>>>>>> include/linux/sunrpc/sched.h聽聽聽聽|聽聽聽聽2 ++ >>>>>>> net/sunrpc/xprt.c聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽|聽聽聽聽4 ++++ >>>>>>> net/sunrpc/xprtrdma/transport.c |聽聽聽聽3 ++- >>>>>>> net/sunrpc/xprtrdma/verbs.c聽聽聽聽聽|聽聽聽13 ++++++++----- >>>>>>> 4 files changed, 16 insertions(+), 6 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/sunrpc/sched.h >>>>>>> b/include/linux/sunrpc/sched.h >>>>>>> index 13822e6..fcea158 100644 >>>>>>> --- a/include/linux/sunrpc/sched.h >>>>>>> +++ b/include/linux/sunrpc/sched.h >>>>>>> @@ -127,6 +127,7 @@ struct rpc_task_setup { >>>>>>> #define RPC_TASK_TIMEOUT 0x1000 /* >>>>>>> fail >>>>>>> with >>>>>>> ETIMEDOUT on timeout */ >>>>>>> #define RPC_TASK_NOCONNECT 0x2000 /* >>>>>>> return >>>>>>> ENOTCONN if not connected */ >>>>>>> #define RPC_TASK_NO_RETRANS_TIMEOUT 0x4000 >>>>>>> /* >>>>>>> wait forever for a reply */ >>>>>>> +#define RPC_TASK_NO_CONG 0x8000 /* >>>>>>> skip >>>>>>> congestion control */ >>>>>>> >>>>>>> #define RPC_TASK_SOFTPING (RPC_TASK_SOFT | >>>>>>> RPC_TASK_SOFTCONN) >>>>>>> >>>>>>> @@ -137,6 +138,7 @@ struct rpc_task_setup { >>>>>>> #define RPC_IS_SOFT(t) ((t)->tk_flags & >>>>>>> (RPC_TASK_SOFT|RPC_TASK_TIMEOUT)) >>>>>>> #define RPC_IS_SOFTCONN(t) ((t)->tk_flags & >>>>>>> RPC_TASK_SOFTCONN) >>>>>>> #define RPC_WAS_SENT(t) ((t)->tk_flags & >>>>>>> RPC_TASK_SENT) >>>>>>> +#define RPC_SKIP_CONG(t) ((t)->tk_flags & >>>>>>> RPC_TASK_NO_CONG) >>>>>>> >>>>>>> #define RPC_TASK_RUNNING 0 >>>>>>> #define RPC_TASK_QUEUED 1 >>>>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>>>>> index b530a28..a477ee6 100644 >>>>>>> --- a/net/sunrpc/xprt.c >>>>>>> +++ b/net/sunrpc/xprt.c >>>>>>> @@ -392,6 +392,10 @@ static inline void >>>>>>> xprt_release_write(struct >>>>>>> rpc_xprt *xprt, struct rpc_task *ta >>>>>>> { >>>>>>> struct rpc_rqst *req = task->tk_rqstp; >>>>>>> >>>>>>> + if (RPC_SKIP_CONG(task)) { >>>>>>> + req->rq_cong = 0; >>>>>>> + return 1; >>>>>>> + } >>>>>> >>>>>> Why not just have the RDMA layer call xprt_reserve_xprt() >>>>>> (and >>>>>> xprt_release_xprt()) if this flag is set? It seems to me that >>>>>> you >>>>>> will >>>>>> need some kind of extra congestion control in the RDMA layer >>>>>> anyway >>>>>> since you only have one reserved credit for these privileged >>>>>> tasks >>>>>> (or >>>>>> did I miss where that is being gated?). >>>>> >>>>> Thanks for the review. >>>>> >>>>> See RPCRDMA_IA_RSVD_CREDIT in 11/12. It's a hack I'm not >>>>> terribly happy with. >>>>> >>>>> So, I think you are suggesting replacing xprtrdma's >>>>> ->reserve_xprt with something like: >>>>> >>>>> int xprt_rdma_reserve_xprt(xprt, task) >>>>> { >>>>> 聽聽聽聽聽if (RPC_SKIP_CONG(task)) >>>>> 聽聽聽聽聽聽聽聽聽聽return xprt_reserve_xprt(xprt, task); >>>>> 聽聽聽聽聽return xprt_reserve_xprt_cong(xprt, task); >>>>> } >>>>> >>>>> and likewise for ->release_xprt ? >>>> >>>> Right. >> >> This seems to work fine for the normal cases. >> >> I'm confused about how to construct xprt_rdma_release_xprt() >> so it never releases a normal RPC task when a SKIP_CONG >> task completes and the credit limit is still full. >> >> If it should send a normal task using the reserved credit >> and that task hangs too, we're in exactly the position >> we wanted to avoid. >> >> My original solution might have had a similar problem, >> come to think of it. >> >> > > That's true... You may need to set up a separate waitqueue that is > reserved for SKIP_CONG tasks. Again, it makes sense to keep that in the > RDMA code. Understood. At this late date, probably the best thing to do is drop the keepalive patches for the v4.11 merge window. That would be 10/12, 11/12, and 12/12. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html