On Thursday, October 10/03/19, 2019 at 14:03:05 +0000, Bernard Metzler wrote: > There are other reasons why the generic > __ib_drain_sq() may fail. A CQ overflow is one > such candidate. Failures are not handled by the ULP, > since calling a void function. The function description of ib_drain_qp() says: * The caller must: * * ensure there is room in the CQ(s), SQ, and RQ for drain work requests * and completions. * * allocate the CQs using ib_alloc_cq(). * * ensure that there are no other contexts that are posting WRs * concurrently. * Otherwise the drain is not guaranteed. */ So, it looks like ULP has to check for available CQs before calling ib_drain_xx(). > > At the other hand, we know that if we have reached > ERROR state, the QP will never escape back to become > full functional; ERROR is the QP's final state. > > So we could do an extra check if we cannot get > the state lock - if we are already in ERROR. And > if yes, complete immediately there as well. > > I can change the patch accordingly. Makes sense? Yes, I think addressing this would make the fix complete. Thanks, Krishna.