Re: ibv_req_notify_cq clarification

Jason Gunthorpe <jgg@xxxxxxxx> · Thu, 18 Feb 2021 20:45:55 -0400

On Thu, Feb 18, 2021 at 06:07:13PM -0500, Tom Talpey wrote:
> > > If the consumer doesn't provide a large-enough CQ, then it reaps the
> > > consequences. Same thing for WQ depth, although I am aware that some
> > > verbs implementations attempt to return a kind of EAGAIN when posting
> > > to a send WQ.
> > > 
> > > What can the provider do if the CQ is "full" anyway? Buffer the CQE
> > > and go into some type of polling loop attempting to redeliver? Ouch!
> > 
> > QP goes to error, CQE is discarded, IIRC.
> 
> What!? There might be many QP's all sharing the same CQ. Put them
> *all* into error? And for what, because the CQ is trash anyway. This
> sounds like optimizing the error case. Uselessly.

No, only the QPs that need to push a CQE and can't.

> > Wrapping and overflowing the CQ is not acceptable, it would mean
> > reading CQEs could never be done reliably.
> 
> But the provider never reads the CQ, only the consumer can read.
> The provider writes to head, ignoring tail. Consumer reads from
> tail, and it goes empty when tail == head. And if head overruns
> tail, that was the consumer's fault for posting too many WQEs.

Yes, but if the app makes a mistake you don't want to trash the whole
system. Resiliency says you contain the failure as much as possible
and the app at least has some chance to pick up the pieces.

If the HW corrupts the CQEs while the CPU is reading them then the
whole machine is toast, high chance the kernel will corrupt memory.

Jason