RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx]
> Sent: Thursday, April 10, 2014 12:44 PM
> To: Steve Wise
> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx; Trond Myklebust
> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
> 
> 
> On Apr 10, 2014, at 11:01 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> > On 4/9/2014 7:26 PM, Chuck Lever wrote:
> >> On Apr 9, 2014, at 7:56 PM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote:
> >>
> >>> Hi Chuk and Trond
> >>>
> >>> I will resend a v2 for this.
> >>> What if ib_post_send() fails with immidate error, I that case also DECR_CQCOUNT()
will
> be called but no completion will be reported. Will that not cause any problems?
> >> We should investigate whether an error return from ib_post_{send,recv} means there
will
> be no completion. But I've never seen these verbs fail in practice, so I'm not in a
hurry to make
> work for anyone! ;-)
> >
> > A synchronous failure from ib_post_* means the WR (or at least one of them if there
were >
> 1) failed and did not get submitted to HW.  So there will be no completion for those
that failed.
> 
> OK.
> 
> Our post operations are largely single WRs. Before we address CQCOUNT in error cases,
we'd
> have to deal with chained WRs.
> 
> Chained WRs are used only when rpcrdma_register_frmr_external() finds an MR that hasn't
> been invalidated. That's actually working around a FRMR re-use bug (commit 5c635e09). If
the
> underlying re-use problem was fixed, we could get rid of the chained WR in
> register_frmr_external() (and we wouldn't need completions at all for FAST_REG_MR).
> 
> But at 100,000 feet, if a post operation fails, that seems like a very serious issue. I
wonder
> whether we would be better off disconnecting and starting over in those cases.
> 

I agree. The application is responsible to flow-control its posting of WRs to the SQs/RQs.
So we should never see sync failures with ib_post_* due to over-running the queues.
However, if the QP moves out of RTS for whatever reason, then a multi-threaded application
could encounter sync failures because the QP exited RTS.  Anyway, I agree: if there are
any failures with ib_post_*, the application should kill the connection, (LOG SOMETHING!),
and setup a new connection.

My 2 centimes.  :)


Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux