> On Oct 1, 2015, at 2:15 PM, Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote: > >>>> A missed WC will result in an RPC/RDMA transport deadlock. In >>>> fact that is the reason for this particular patch (although >>>> it addresses only one source of missed WCs). So I would like >>>> to see that there are no windows here. >>> >>> WCs are never missed. >> >> The review comment earlier in this thread suggested there is >> a race condition where a WC can be “delayed” resulting in, >> well, I’m still not certain what the consequences are. > > Yes. The consequence would typically be lockup of CQ processing. > >>> while (1) { >>> struct ib_wc wcs[100]; >>> int rc = ib_poll_cq(cw, NELEMS(wcs), wcs); >>> >>> .. process rc wcs .. >>> >>> if (rc != NELEMS(wcs)) >>> if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP | >>> IB_CQ_REPORT_MISSED_EVENTS) == 0) >>> break; >>> } >>> >>> API wise, we should probably look at forcing >>> IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag. >> >> It’s been suggested that it’s not clear what a positive >> return value from ib_req_notify_cq() means when the >> REPORT_MISSED_EVENTS flags is set: does it mean that >> the CQ has been re-armed? I had assumed that a positive >> RC meant both missed events and a successful re-arm, >> but the pseudo-code above suggests that is not the >> case. > > The ULP must assume the CQ has NOT been armed after a positive return. OK, I will fix this when I revise 03/18. > What the driver does to the arm state is undefined - for instance the > driver may trigger a callback and still return 1 here. > > However, the driver must make this guarentee: > > If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then > the call back will always be called when the CQ is non-empty. > > The ULP must loop doing polling until the above happens, otherwise the > event notification may be missed. > > ie the above is guarnteed to close the WC delay/lockup race. > > Again, if there has been confusion on the driver side, drivers that > don't implement the above are broken. > > Review Roland's original commit comments on this feature. > > https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3 > > I'm not sure where we are at on the 'significant overhead for some > low-level drivers' issue, but assuming that is still the case, then > the recommendation is this: > > bool exiting = false; > while (1) { > struct ib_wc wcs[100]; > int rc = ib_poll_cq(cq, NELEMS(wcs), wcs); > if (rc == 0 && exiting) > break; > > .. process rc wcs .. > > if (rc != NELEMS(wcs)) { > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) > exiting = true; > } else > exiting = false; > } > > ie a double poll. > AFAIK, this is a common pattern in the ULPs.. Perhaps we should > implement this as a core API: > > struct ib_wc wcs[100]; > while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) { > .. process rc wcs .. > > ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the > callback is guarenteed to happen when the CQ is non empty. This makes sense to me, especially if it means fewer times grabbing and releasing the CQ’s spinlock. Does a ULP want to continue polling if ib_poll_cq{_and_arm) returns a negative RC? — Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html