Re: ibv_req_notify_cq clarification

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 22 Feb 2021 15:37:46 -0400

On Mon, Feb 22, 2021 at 09:24:18PM +0200, Gal Pressman wrote:
> On 22/02/2021 17:55, Jason Gunthorpe wrote:
> > On Mon, Feb 22, 2021 at 05:36:17PM +0200, Gal Pressman wrote:
> > 
> >> "Mellanox HCAs keep track of the last index for which the user received an
> >> event. Using this index, it is guaranteed that an event is generated immediately
> >> when a request completion notification is performed and a CQE has already been
> >> reported."
> > 
> > I don't think verbs exposes this behavior.
> 
> So in theory this hardware could generate events that the user app
> doesn't expect?

Not really, it shuffles around how race conditions are resolved.

> But looking at ibv_ud_pingpong for example, I don't understand how
> that could even work.  The test arms the CQ on creation (consumder
> index 0), calls ibv_get_cq_event(), wakes up and immediately arms
> the CQ again (before polling, consumer index is still 0).

By IBTA spec this is wrong and racey.

> This means that the next ibv_get_cq_event() will wake up immediately, as the CQ
> was armed twice with the same consumer index and the first completion already
> exists. Surely that's not what's meant to happen?

If the driver for this HW stuffs the consumer index then yes, you'd
get a wakeup as though the new entries were delivered instantly after
the arm. If it stuffs the current producer index then you get behavior
like IBTA describes.

I'd say they are both compatible ways to approach this, the app can't
tell the state of the CQ, so it can't know if the new CQEs were
delievered before or after it ARM'd it.

> Do you have a way to verify whether this test gets stuck? Maybe I am
> missing something?

If the mlx5 implementation is doing like you say then it will not get
stuck on that HW, but it is not to spec

> What do you mean by arming a non-empty CQ?

The arm only has meaning if you know the CQ is empty because then you
can reliably catch the empty->!empty transition, which is the whole
point.

If the CQ is non-empty then, by spec, if no new events arrive it will
never be signaled - the app must re-poll it on its own so why arm it?

> The man pages suggest a scheme where the app should call ibv_get_cq_event()
> followed by an ibv_req_notify_cq(), the CQ polling/emptying comes after these,
> so at the time of arm the CQ isn't empty.

It doesn't show how to re-arm it seems

I'm repeating from memory here, I haven't checked the specs, but that
Sean agrees seems like I'm remembering it right :)

The question you seem to be asking is what happens if you re-arm a
non-empty CQ, do you immediately get an event or not? It should be
easy enough to test on siw, rxe and mlx5 and see

I expect by spec arming a non-empty CQ will not generate an event. The
spec was written to support very simple hardware that would simply
generate an event the next time it writes a CQE then atomically clear
the arm.

Does look like a documentation update is in-order though!

Jason