> On Sep 22, 2015, at 1:32 PM, Devesh Sharma <devesh.sharma@xxxxxxxxxxxxx> wrote: > > On Mon, Sep 21, 2015 at 9:15 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >> >>> On Sep 21, 2015, at 1:51 AM, Devesh Sharma <devesh.sharma@xxxxxxxxxxxxx> wrote: >>> >>> On Sun, Sep 20, 2015 at 4:05 PM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote: >>>>>> It is possible that in a given poll_cq >>>>>> call you end up getting on 1 completion, the other completion is >>>>>> delayed due to some reason. >>>>> >>>>> >>>>> If a CQE is allowed to be delayed, how does polling >>>>> again guarantee that the consumer can retrieve it? >>>>> >>>>> What happens if a signal occurs, there is only one CQE, >>>>> but it is delayed? ib_poll_cq would return 0 in that >>>>> case, and the consumer would never call again, thinking >>>>> the CQ is empty. There's no way the consumer can know >>>>> for sure when a CQ is drained. >>>>> >>>>> If the delayed CQE happens only when there is more >>>>> than one CQE, how can polling multiple WCs ever work >>>>> reliably? >>>>> >>>>> Maybe I don't understand what is meant by delayed. >>>>> >>>> >>>> If I'm not mistaken, Devesh meant that if between ib_poll_cq (where you >>>> polled the last 2 wcs) until the while statement another CQE was >>>> generated then you lost a bit of efficiency. Correct? >>> >>> Yes, That's the point. >> >> I’m optimizing for the common case where 1 CQE is ready >> to be polled. How much of an efficiency loss are you >> talking about, how often would this loss occur, and is >> this a problem for all providers / devices? > > The scenario would happen or not is difficult to predict, but its > quite possible with any vendor based on load on PCI bus I guess. > This may affect the latency figures though. > >> >> Is this an issue for the current arrangement where 8 WCs >> are polled at a time? > > Yes, its there even today. This review comment does not feel closed yet. Maybe it’s because I don’t understand exactly what the issue is. Is this the problem that REPORT_MISSED_EVENTS is supposed to resolve? A missed WC will result in an RPC/RDMA transport deadlock. In fact that is the reason for this particular patch (although it addresses only one source of missed WCs). So I would like to see that there are no windows here. I’ve been told the only sure way to address this for every provider is to use the classic but inefficient mechanism of poll one WC at a time until no WC is returned; re-arm; poll again until no WC is returned. In the common case this means two extra poll_cq calls that return nothing. So I claim the current status quo isn’t good enough :-) Doug and others have suggested the best place to address problems with missed WC signals is in the drivers. All of them should live up to the ib_poll_cq() API contract the same way. In addition I’d really like to see - polling and arming work without having to perform extra unneeded locking of the CQ, and - polling arrays work without introducing races Can we have that discussion now, since there is already some discussion of IB core API fix-ups? — Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html