Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Oct 1, 2015, at 2:15 PM, Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote:
> 
>>>> A missed WC will result in an RPC/RDMA transport deadlock. In
>>>> fact that is the reason for this particular patch (although
>>>> it addresses only one source of missed WCs). So I would like
>>>> to see that there are no windows here.
>>> 
>>> WCs are never missed.
>> 
>> The review comment earlier in this thread suggested there is
>> a race condition where a WC can be “delayed” resulting in,
>> well, I’m still not certain what the consequences are.
> 
> Yes. The consequence would typically be lockup of CQ processing.
> 
>>> while (1) {
>>> struct ib_wc wcs[100];
>>> int rc = ib_poll_cq(cw, NELEMS(wcs), wcs);
>>> 
>>> .. process rc wcs ..
>>> 
>>> if (rc != NELEMS(wcs))
>>>   if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
>>>       IB_CQ_REPORT_MISSED_EVENTS) == 0)
>>>    break;
>>> }
>>> 
>>> API wise, we should probably look at forcing
>>> IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag.
>> 
>> It’s been suggested that it’s not clear what a positive
>> return value from ib_req_notify_cq() means when the
>> REPORT_MISSED_EVENTS flags is set: does it mean that
>> the CQ has been re-armed? I had assumed that a positive
>> RC meant both missed events and a successful re-arm,
>> but the pseudo-code above suggests that is not the
>> case.
> 
> The ULP must assume the CQ has NOT been armed after a positive return.

OK, I will fix this when I revise 03/18.


> What the driver does to the arm state is undefined - for instance the
> driver may trigger a callback and still return 1 here.
> 
> However, the driver must make this guarentee:
> 
> If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then
> the call back will always be called when the CQ is non-empty.
> 
> The ULP must loop doing polling until the above happens, otherwise the
> event notification may be missed.
> 
> ie the above is guarnteed to close the WC delay/lockup race.
> 
> Again, if there has been confusion on the driver side, drivers that
> don't implement the above are broken.
> 
> Review Roland's original commit comments on this feature.
> 
> https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3
> 
> I'm not sure where we are at on the 'significant overhead for some
> low-level drivers' issue, but assuming that is still the case, then
> the recommendation is this:
> 
> bool exiting = false;
> while (1) {
>   struct ib_wc wcs[100];
>   int rc = ib_poll_cq(cq, NELEMS(wcs), wcs);
>   if (rc == 0 && exiting)
>        break;
> 
>   .. process rc wcs ..
> 
>   if (rc != NELEMS(wcs)) {
>        ib_req_notify_cq(cq, IB_CQ_NEXT_COMP)
> 	exiting = true;
>   } else
> 	exiting = false;
> }
> 
> ie a double poll.
> AFAIK, this is a common pattern in the ULPs.. Perhaps we should
> implement this as a core API:
> 
> struct ib_wc wcs[100];
> while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) {
>   .. process rc wcs  ..
> 
> ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the
> callback is guarenteed to happen when the CQ is non empty.

This makes sense to me, especially if it means fewer
times grabbing and releasing the CQ’s spinlock.

Does a ULP want to continue polling if ib_poll_cq{_and_arm)
returns a negative RC?


—
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux