Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 24 Jul 2019 10:01:36 -0400

Hi Leon, thanks for taking a look. Responses below.

> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> 
> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
>> Send and Receive completion is handled on a single CPU selected at
>> the time each Completion Queue is allocated. Typically this is when
>> an initiator instantiates an RDMA transport, or when a target
>> accepts an RDMA connection.
>> 
>> Some ULPs cannot open a connection per CPU to spread completion
>> workload across available CPUs. For these ULPs, allow the RDMA core
>> to select a completion vector based on the device's complement of
>> available comp_vecs.
>> 
>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
>> available, a different CPU will be selected for each Completion
>> Queue. For the moment, a simple round-robin mechanism is used.
>> 
>> Suggested-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> 
> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
> From what I see, or callers are internally implementing similar logic
> to proposed here, or they don't care (set 0).

The goal of this patch is to deduplicate that "similar logic".
Callers that implement this logic already can use
RDMA_CORE_ANY_COMPVEC and get rid of their own copy.

> Can we enable this comp_vector for everyone and simplify our API?

We could create a new CQ allocation API that does not take a
comp vector. That might be cleaner than passing in a -1.

But I think some ULPs still want to use the existing API to
allocate one CQ for each of a device's comp vectors.

>> ---
>> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
>> include/rdma/ib_verbs.h                  |    3 +++
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
>> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
>> 4 files changed, 28 insertions(+), 6 deletions(-)
>> 
>> Jason-
>> 
>> If this patch is acceptable to all, then I would expect you to take
>> it through the RDMA tree.
>> 
>> 
>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>> index 7c599878ccf7..a89d549490c4 100644
>> --- a/drivers/infiniband/core/cq.c
>> +++ b/drivers/infiniband/core/cq.c
>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
>> 	queue_work(cq->comp_wq, &cq->work);
>> }
>> 
>> +/*
>> + * Attempt to spread ULP completion queues over a device's completion
>> + * vectors so that all available CPU cores can help service the device's
>> + * interrupt workload. This mechanism may be improved at a later point
>> + * to dynamically take into account the system's actual workload.
>> + */
>> +static int ib_get_comp_vector(struct ib_device *dev)
>> +{
>> +	static atomic_t cv;
>> +
>> +	if (dev->num_comp_vectors > 1)
>> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
> 
> It is worth to take into account num_online_cpus(),

I don't believe it is.

Håkon has convinced me that assigning interrupt vectors to
CPUs is in the domain of user space (ie, driven by policy).
In addition, one assumes that taking a CPU offline properly
will also involve re-assigning interrupt vectors that point
to that core.

In any event, this code can be modified after it is merged
if it is necessary to accommodate such requirements.

--
Chuck Lever