Re: ibv_req_notify_cq clarification

Gal Pressman <galpress@xxxxxxxxxx> · Mon, 22 Feb 2021 17:36:17 +0200

On 22/02/2021 15:46, Jason Gunthorpe wrote:
> On Sun, Feb 21, 2021 at 11:25:02AM +0200, Gal Pressman wrote:
>> On 18/02/2021 18:23, Jason Gunthorpe wrote:
>>> On Thu, Feb 18, 2021 at 05:52:16PM +0200, Gal Pressman wrote:
>>>> On 18/02/2021 14:53, Jason Gunthorpe wrote:
>>>>> On Thu, Feb 18, 2021 at 11:13:43AM +0200, Gal Pressman wrote:
>>>>>> I'm a bit confused about the meaning of the ibv_req_notify_cq() verb:
>>>>>> "Upon the addition of a new CQ entry (CQE) to cq, a completion event will be
>>>>>> added to the completion channel associated with the CQ."
>>>>>>
>>>>>> What is considered a new CQE in this case?
>>>>>> The next CQE from the user's perspective, i.e. any new CQE that wasn't consumed
>>>>>> by the user's poll cq?
>>>>>> Or any new CQE from the device's perspective?
>>>>>
>>>>> new CQE from the device perspective.
>>>>>
>>>>>> For example, if at the time of ibv_req_notify_cq() call the CQ has received 100
>>>>>> completions, but the user hasn't polled his CQ yet, when should he be notified?
>>>>>> On the 101 completion or immediately (since there are completions waiting on the
>>>>>> CQ)?
>>>>>
>>>>> 101 completion
>>>>>
>>>>> It is only meaningful to call it when the CQ is empty.
>>>>
>>>> Thanks, so there's an inherent race between the user's CQ poll and the next arm?
>>>
>>> I think the specs or man pages talk about this, the application has to
>>> observe empty, do arm, then poll again then sleep on the cq if empty.
>>>
>>>> Do you know what's the purpose of the consumer index in the arm doorbell that's
>>>> implemented by many providers?
>>>
>>> The consumer index is needed by HW to prevent CQ overflow, presumably
>>> the drivers push to reduce the cases where the HW has to read it from
>>> PCI
>>
>> Thanks, that makes sense.
>>
>> I found the following sentence in CX PRM:
>> "If new CQEs are posted to the CQ after the reporting of a completion event and
>> these CQEs are not yet consumed, then an event will be generated immediately
>> after the request for notification is executed."
>>
>> Doesn't that contradict the expected behavior?
> 
> I read it as confirming it?
> 
> Only *new* CQEs trigger an event, and new CQE's always trigger an
> event regardless of the full/empty state of the queue.
> 
> This paragraph is an obtuse way of warning of the race I described.

Hmm, yea this sentence is a bit confusing :)..

"Mellanox HCAs keep track of the last index for which the user received an
event. Using this index, it is guaranteed that an event is generated immediately
when a request completion notification is performed and a CQE has already been
reported."

This also sounds weird, why is an event generated for a completion that has
already been reported?

So from my understanding of how this should work, the following code in perftest
(ib_send_bw test) is buggy?:
https://github.com/linux-rdma/perftest/blob/master/src/perftest_resources.c#L2955

Running this with 32 iterations, the client does something like:
- arm cq
- post send x 32
- wait for cq event
- arm cq
- poll cq (once, with batch size of 16)
- no more post send (reached tot_iters)
- wait for cq event (but an event has already been generated?)

And gets stuck?