RE: ibv_req_notify_cq clarification

"Hefty, Sean" <sean.hefty@xxxxxxxxx> · Mon, 22 Feb 2021 22:41:06 +0000

> > It eliminates races, if the consumer says 'I read up to X send me an
> > interrupt if X+1 exists' when X+1 already exists if there is a race
> > producer has already written it. So send an interrupt.
> 
> Right, that's what I was getting at in my first question, this isn't the next
> completion from the device's perspective.

>From the spec:

Requests the CQ event handler be called when the next completion
entry of the specified type is added to the specified CQ. The handler
is called at most once per Request Completion Notification call for a
particular CQ. Any CQ entries that existed before the notify is enabled
will not result in a call to the handler.

It may help to think of it in terms of edge versus level triggered.  The IB spec defines the CQ as behaving similar to an edge triggered object.  I would guess that 99% of developers would successfully write an edge-triggered based application that hangs.  A level triggered wait is much easier to get correct.

> So in such case the consumer index in the arm doorbell is used to indicate what
> should be considered a "new" completion? This is new from the app perspective.

The verbs API only guarantees that the CQ will be signaled when a new completion relative to the HW has been written.  If there's a driver that converts the behavior into a level triggered operation, props to them!  That implementation is far more likely to make incorrectly written applications work than ever break a correctly written one.

> But looking at ibv_ud_pingpong for example, I don't understand how that could
> even work.
> The test arms the CQ on creation (consumder index 0), calls ibv_get_cq_event(),
> wakes up and immediately arms the CQ again (before polling, consumer index is
> still 0).

In the worst case, arming the CQ just results in select/poll/epoll returning immediately the next time it is called.  The thread finds the CQ empty, rearms the CQ again, and select/poll/epoll finally block.

> >> Running this with 32 iterations, the client does something like:
> >> - arm cq
> >> - post send x 32
> >> - wait for cq event
> >> - arm cq
> >> - poll cq (once, with batch size of 16)
> >> - no more post send (reached tot_iters)
> >> - wait for cq event (but an event has already been generated?)
> >
> > I don't know much about perf-test, but in verbs arming a non-empty CQ
> > is asking for trouble
> 
> Do you have a way to verify whether this test gets stuck? Maybe I am missing
> something?

Looking at the code, I agree that it looks racy in a way that could cause it to hang.  My guess is most people running perftests are looking for the best performance numbers, so blocking completions are rarely enabled.  And even if they were, it's still a race condition as to whether the test would hang.

> What do you mean by arming a non-empty CQ?
> The man pages suggest a scheme where the app should call ibv_get_cq_event()
> followed by an ibv_req_notify_cq(), the CQ polling/emptying comes after these,
> so at the time of arm the CQ isn't empty.

The app can do:

Wakeup
While read cqe succeeds
	Process cqe
Read cq event
Arm cq
/* cqe's may have been written between the last read and arming */
While read cqe succeeds
	Process cqe
Sleep

Or shorten this to:

Wakeup
Read cq event
Arm cq
While read cqe succeeds
	Process cqe
Sleep

In both cases, all cqe's must be processed after calling arm, and it's possible to read a cq event only to find the cq empty.  One could argue which is more efficient, but we're talking about a sleeping thread in either case.

- Sean