On Fri, Feb 19, 2021 at 09:31:22AM -0500, Tom Talpey wrote: > On 2/18/2021 7:45 PM, Jason Gunthorpe wrote: > > On Thu, Feb 18, 2021 at 06:07:13PM -0500, Tom Talpey wrote: > > > > > If the consumer doesn't provide a large-enough CQ, then it reaps the > > > > > consequences. Same thing for WQ depth, although I am aware that some > > > > > verbs implementations attempt to return a kind of EAGAIN when posting > > > > > to a send WQ. > > > > > > > > > > What can the provider do if the CQ is "full" anyway? Buffer the CQE > > > > > and go into some type of polling loop attempting to redeliver? Ouch! > > > > > > > > QP goes to error, CQE is discarded, IIRC. > > > > > > What!? There might be many QP's all sharing the same CQ. Put them > > > *all* into error? And for what, because the CQ is trash anyway. This > > > sounds like optimizing the error case. Uselessly. > > > > No, only the QPs that need to push a CQE and can't. > > Hm. Ok, so QP's will drop unpredictably, and their outstanding WQEs > will probably be lost as well, but I can see cases where a CQ slot > might open up while the failed QP is flushing, and CQE's get delivered > out of order. That might be even worse. It would seem safer to stop > writing to the CQ altogether - all QPs. I think the app gets an IBV_EVENT_CQ_ERR and IBV_EVENT_QP_FATAL and has to clean it up. > That would be a problem, but it's only true if the provider implements > the CQ as a circular buffer. AFAIK there is no datastructure that allows unbounded writing from the producer side, the only choices are to halt on overflow or corrupt on overflow - corrupt breaks the machine, so good HW does halt. Jason