> -----Original Message----- > From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] > Sent: Thursday, July 03, 2014 1:30 AM > To: Devesh Sharma > Cc: Steve Wise; linux-rdma@xxxxxxxxxxxxxxx; Linux NFS Mailing List > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport > disconnect > > > On Jul 2, 2014, at 3:48 PM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> > wrote: > > > > > > >> -----Original Message----- > >> From: Steve Wise [mailto:swise@xxxxxxxxxxxxxxxxxxxxx] > >> Sent: Thursday, July 03, 2014 1:16 AM > >> To: 'Chuck Lever'; Devesh Sharma > >> Cc: linux-rdma@xxxxxxxxxxxxxxx; 'Linux NFS Mailing List' > >> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport > >> disconnect > >> > >> > >> > >>> -----Original Message----- > >>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] > >>> Sent: Wednesday, July 02, 2014 2:40 PM > >>> To: Steve Wise; Devesh Sharma > >>> Cc: linux-rdma@xxxxxxxxxxxxxxx; Linux NFS Mailing List > >>> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport > >>> disconnect > >>> > >>> > >>> On Jul 2, 2014, at 3:28 PM, Steve Wise > <swise@xxxxxxxxxxxxxxxxxxxxx> > >> wrote: > >>> > >>>> On 7/2/2014 2:06 PM, Devesh Sharma wrote: > >>>>> This change is very much prone to generate poll_cq errors because > >>>>> of un-cleaned > >>> completions which still > >>>>> point to the non-existent QPs. On the new connection when these > >>>>> completions are polled, > >>> the poll_cq will > >>>>> fail because old QP pointer is already NULL. > >>>>> Did anyone hit this situation during their testing? > >>> > >>> I tested this aggressively with a fault injector that triggers > >>> regular connection disruption. > >>> > >>>> Hey Devesh, > >>>> > >>>> iw_cxgb4 will silently toss CQEs if the QP is not active. > >>> > >>> xprtrdma relies on getting a completion (either successful or in > >>> error) for every WR it has posted. The goal of this patch is to > >>> avoid throwing away queued completions after a transport disconnect > >>> so we don't lose track of FRMR rkey updates (FAST_REG_MR and > >>> LOCAL_INV > >>> completions) and we can capture all RPC replies posted before the > >> connection was lost. > >>> > >>> Sounds like we also need to keep the QP around, even in error state, > >>> until all known WRs on that QP have completed? > >>> > > > > Why not poll and process every completion during > rpcrdma_cq_cleanup()…. > > Yes, I have a patch in the next version of this series that does that. > It just calls rpcrdma_sendcq_upcall() from the connect worker. I will squash > that change into this patch. Cool. > > Maybe it needs to invoke rpcrdma_recvcq_upcall() there as well. Yes, that would be needed. > > > > > >> > >> Perhaps. > >> > >>> > >>>> > >>>> > >>>>>> -----Original Message----- > >>>>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > >>>>>> owner@xxxxxxxxxxxxxxx] On Behalf Of Chuck Lever > >>>>>> Sent: Tuesday, June 24, 2014 4:10 AM > >>>>>> To: linux-rdma@xxxxxxxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx > >>>>>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport > >>>>>> disconnect > >>>>>> > >>>>>> CQs are not destroyed until unmount. By draining CQs on transport > >>>>>> disconnect, successful completions that can change the > >>>>>> r.frmr.state field can be missed. > >>>>>> > >>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>>>>> --- > >>>>>> net/sunrpc/xprtrdma/verbs.c | 5 ----- > >>>>>> 1 file changed, 5 deletions(-) > >>>>>> > >>>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c > >>>>>> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644 > >>>>>> --- a/net/sunrpc/xprtrdma/verbs.c > >>>>>> +++ b/net/sunrpc/xprtrdma/verbs.c > >>>>>> @@ -873,9 +873,6 @@ retry: > >>>>>> dprintk("RPC: %s: > rpcrdma_ep_disconnect" > >>>>>> " status %i\n", __func__, rc); > >>>>>> > >>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq); > >>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq); > >>>>>> - > >>>>>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); > >>>>>> id = rpcrdma_create_id(xprt, ia, > >>>>>> (struct sockaddr *)&xprt- > >rx_data.addr); > >> @@ -985,8 +982,6 @@ > >>>>>> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia > >>>>>> *ia) { > >>>>>> int rc; > >>>>>> > >>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq); > >>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq); > >>>>>> rc = rdma_disconnect(ia->ri_id); > >>>>>> if (!rc) { > >>>>>> /* returns without wait if not connected */ > >>>>>> > >>>>>> -- > >>>>>> To unsubscribe from this list: send the line "unsubscribe > >>>>>> linux-rdma" in the body of a message to > majordomo@xxxxxxxxxxxxxxx > >>>>>> More majordomo info at http://vger.kernel.org/majordomo- > info.html > >>>>> N r y b X ǧv ^ ){.n + { " ^n r z h & G > >>>>> h ( 階 > >>> ݢj" m z ޖ f h ~ mml== > >>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" > >>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More > >> majordomo > >>>> info at http://vger.kernel.org/majordomo-info.html > >>> > >>> -- > >>> Chuck Lever > >>> chuck[dot]lever[at]oracle[dot]com > >>> > >>> > >> > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥