Re: ibv_req_notify_cq clarification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/18/2021 7:45 PM, Jason Gunthorpe wrote:
On Thu, Feb 18, 2021 at 06:07:13PM -0500, Tom Talpey wrote:
If the consumer doesn't provide a large-enough CQ, then it reaps the
consequences. Same thing for WQ depth, although I am aware that some
verbs implementations attempt to return a kind of EAGAIN when posting
to a send WQ.

What can the provider do if the CQ is "full" anyway? Buffer the CQE
and go into some type of polling loop attempting to redeliver? Ouch!

QP goes to error, CQE is discarded, IIRC.

What!? There might be many QP's all sharing the same CQ. Put them
*all* into error? And for what, because the CQ is trash anyway. This
sounds like optimizing the error case. Uselessly.

No, only the QPs that need to push a CQE and can't.

Hm. Ok, so QP's will drop unpredictably, and their outstanding WQEs
will probably be lost as well, but I can see cases where a CQ slot
might open up while the failed QP is flushing, and CQE's get delivered
out of order. That might be even worse. It would seem safer to stop
writing to the CQ altogether - all QPs.

Wrapping and overflowing the CQ is not acceptable, it would mean
reading CQEs could never be done reliably.

But the provider never reads the CQ, only the consumer can read.
The provider writes to head, ignoring tail. Consumer reads from
tail, and it goes empty when tail == head. And if head overruns
tail, that was the consumer's fault for posting too many WQEs.

Yes, but if the app makes a mistake you don't want to trash the whole
system. Resiliency says you contain the failure as much as possible
and the app at least has some chance to pick up the pieces.

If the HW corrupts the CQEs while the CPU is reading them then the
whole machine is toast, high chance the kernel will corrupt memory.

That would be a problem, but it's only true if the provider implements
the CQ as a circular buffer. That isn't imposed by the Verbs. The CQ
itself is opaque to the consumer, it's merely a queue with arm and
dequeue operations - no enqueue, no head/tail or other pointers, etc.

So yeah, a provider that made such a choice will need to be careful.
But there are other, possibly better, ways.

Tom.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux