On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote: > Send and Receive completion is handled on a single CPU selected at > the time each Completion Queue is allocated. Typically this is when > an initiator instantiates an RDMA transport, or when a target > accepts an RDMA connection. > > Some ULPs cannot open a connection per CPU to spread completion > workload across available CPUs. For these ULPs, allow the RDMA core > to select a completion vector based on the device's complement of > available comp_vecs. > > When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are > available, a different CPU will be selected for each Completion > Queue. For the moment, a simple round-robin mechanism is used. > > Suggested-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq? >From what I see, or callers are internally implementing similar logic to proposed here, or they don't care (set 0). Can we enable this comp_vector for everyone and simplify our API? > --- > drivers/infiniband/core/cq.c | 20 +++++++++++++++++++- > include/rdma/ib_verbs.h | 3 +++ > net/sunrpc/xprtrdma/svc_rdma_transport.c | 6 ++++-- > net/sunrpc/xprtrdma/verbs.c | 5 ++--- > 4 files changed, 28 insertions(+), 6 deletions(-) > > Jason- > > If this patch is acceptable to all, then I would expect you to take > it through the RDMA tree. > > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index 7c599878ccf7..a89d549490c4 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) > queue_work(cq->comp_wq, &cq->work); > } > > +/* > + * Attempt to spread ULP completion queues over a device's completion > + * vectors so that all available CPU cores can help service the device's > + * interrupt workload. This mechanism may be improved at a later point > + * to dynamically take into account the system's actual workload. > + */ > +static int ib_get_comp_vector(struct ib_device *dev) > +{ > + static atomic_t cv; > + > + if (dev->num_comp_vectors > 1) > + return atomic_inc_return(&cv) % dev->num_comp_vectors; It is worth to take into account num_online_cpus(), Thanks