Re: [PATCH 1/5] IB/core: add a simple SRQ set per PD

Max Gurtovoy <maxg@xxxxxxxxxxxx> · Tue, 17 Mar 2020 23:56:16 +0200

On 3/17/2020 8:43 PM, Jason Gunthorpe wrote:
On Tue, Mar 17, 2020 at 08:24:30PM +0200, Max Gurtovoy wrote:
On 3/17/2020 8:10 PM, Jason Gunthorpe wrote:
On Tue, Mar 17, 2020 at 06:37:57PM +0200, Max Gurtovoy wrote:

+#include <rdma/ib_verbs.h>
+
+struct ib_srq *rdma_srq_get(struct ib_pd *pd);
+void rdma_srq_put(struct ib_pd *pd, struct ib_srq *srq);
At the end, it is not get/put semantics but more add/remove.
srq = rdma_srq_add ?

rdma_srq_remove(pd, srq) ?

Doesn't seems right to me.

Lets make it simple. For asking a SRQ from the PD set lets use rdma_srq_get
and returning to we'll use rdma_srq_put.
Is there reference couting here? get/put should be restricted to
refcounting APIs, IMHO.
I've added a counter (pd->srqs_used) that Leon asked to remove .

There is no call to kref get/put here.
I didn't look closely, any kind of refcount scheme is reasonable, but
if add is supposed to create a new srq then that isn't 'get'..

No, we don't create new srq during the "get". We create a set using 
"rdma_srq_set_init".

"get" will simple pull some srq from the set and "put" will push it back.

Do you prefer that I'll change it to be array in PD: "struct
ib_srq           **srqs;" ?
Not particularly..

It actually feels a bit weird, should there be some numa-ness involved
here so that the SRQ with memory on the node that is going to be
polling it is returned?

Maybe this will be the next improvement. But for now the receive buffers 
are allocated by the ULP.

The idea is to spread the SRQs as much as possible as we do for QP/CQ to 
reach almost the same performance.

In case of 1 SRQ we can't reach good performance since many resources 
and cores are racing for 1 resources.

In case of regular RQ we allocate many buffers that most of the time are 
idle.

If we'll spread SRQs for all cores/vectors we have we'll get great perf 
with saving resources that might be critical in MQ ULPs as NVMf/SRP 
targets that might serve hundreds initiators with hundreds of queues each.

And update ib_alloc_pd API to get pd_attrs and allocate the array during PD
allocation ?
The API is a bit more composable if things can can be done as
following function calls are done that way.. I don't like the giant
multiplexor structs in general

Jason