> -----Original Message----- > From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] > Sent: Thursday, April 10, 2014 12:44 PM > To: Steve Wise > Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx; Trond Myklebust > Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks > > > On Apr 10, 2014, at 11:01 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > On 4/9/2014 7:26 PM, Chuck Lever wrote: > >> On Apr 9, 2014, at 7:56 PM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote: > >> > >>> Hi Chuk and Trond > >>> > >>> I will resend a v2 for this. > >>> What if ib_post_send() fails with immidate error, I that case also DECR_CQCOUNT() will > be called but no completion will be reported. Will that not cause any problems? > >> We should investigate whether an error return from ib_post_{send,recv} means there will > be no completion. But I've never seen these verbs fail in practice, so I'm not in a hurry to make > work for anyone! ;-) > > > > A synchronous failure from ib_post_* means the WR (or at least one of them if there were > > 1) failed and did not get submitted to HW. So there will be no completion for those that failed. > > OK. > > Our post operations are largely single WRs. Before we address CQCOUNT in error cases, we'd > have to deal with chained WRs. > > Chained WRs are used only when rpcrdma_register_frmr_external() finds an MR that hasn't > been invalidated. That's actually working around a FRMR re-use bug (commit 5c635e09). If the > underlying re-use problem was fixed, we could get rid of the chained WR in > register_frmr_external() (and we wouldn't need completions at all for FAST_REG_MR). > > But at 100,000 feet, if a post operation fails, that seems like a very serious issue. I wonder > whether we would be better off disconnecting and starting over in those cases. > I agree. The application is responsible to flow-control its posting of WRs to the SQs/RQs. So we should never see sync failures with ib_post_* due to over-running the queues. However, if the QP moves out of RTS for whatever reason, then a multi-threaded application could encounter sync failures because the QP exited RTS. Anyway, I agree: if there are any failures with ib_post_*, the application should kill the connection, (LOG SOMETHING!), and setup a new connection. My 2 centimes. :) Steve -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html