Re: ibv_req_notify_cq clarification

Jason Gunthorpe <jgg@xxxxxxxx> · Fri, 19 Feb 2021 10:42:02 -0400

On Fri, Feb 19, 2021 at 09:31:22AM -0500, Tom Talpey wrote:
> On 2/18/2021 7:45 PM, Jason Gunthorpe wrote:
> > On Thu, Feb 18, 2021 at 06:07:13PM -0500, Tom Talpey wrote:
> > > > > If the consumer doesn't provide a large-enough CQ, then it reaps the
> > > > > consequences. Same thing for WQ depth, although I am aware that some
> > > > > verbs implementations attempt to return a kind of EAGAIN when posting
> > > > > to a send WQ.
> > > > > 
> > > > > What can the provider do if the CQ is "full" anyway? Buffer the CQE
> > > > > and go into some type of polling loop attempting to redeliver? Ouch!
> > > > 
> > > > QP goes to error, CQE is discarded, IIRC.
> > > 
> > > What!? There might be many QP's all sharing the same CQ. Put them
> > > *all* into error? And for what, because the CQ is trash anyway. This
> > > sounds like optimizing the error case. Uselessly.
> > 
> > No, only the QPs that need to push a CQE and can't.
> 
> Hm. Ok, so QP's will drop unpredictably, and their outstanding WQEs
> will probably be lost as well, but I can see cases where a CQ slot
> might open up while the failed QP is flushing, and CQE's get delivered
> out of order. That might be even worse. It would seem safer to stop
> writing to the CQ altogether - all QPs.

I think the app gets an IBV_EVENT_CQ_ERR and IBV_EVENT_QP_FATAL and
has to clean it up.

> That would be a problem, but it's only true if the provider implements
> the CQ as a circular buffer. 

AFAIK there is no datastructure that allows unbounded writing from the
producer side, the only choices are to halt on overflow or corrupt on
overflow - corrupt breaks the machine, so good HW does halt.

Jason