Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 25 Jul 2019 10:03:13 -0400

> On Jul 25, 2019, at 9:17 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> 
> On Wed, Jul 24, 2019 at 10:01:36AM -0400, Chuck Lever wrote:
>> Hi Leon, thanks for taking a look. Responses below.
>> 
>> 
>>> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>>> 
>>> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
>>>> Send and Receive completion is handled on a single CPU selected at
>>>> the time each Completion Queue is allocated. Typically this is when
>>>> an initiator instantiates an RDMA transport, or when a target
>>>> accepts an RDMA connection.
>>>> 
>>>> Some ULPs cannot open a connection per CPU to spread completion
>>>> workload across available CPUs. For these ULPs, allow the RDMA core
>>>> to select a completion vector based on the device's complement of
>>>> available comp_vecs.
>>>> 
>>>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
>>>> available, a different CPU will be selected for each Completion
>>>> Queue. For the moment, a simple round-robin mechanism is used.
>>>> 
>>>> Suggested-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx>
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> 
>>> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
>>> From what I see, or callers are internally implementing similar logic
>>> to proposed here, or they don't care (set 0).
>> 
>> The goal of this patch is to deduplicate that "similar logic".
>> Callers that implement this logic already can use
>> RDMA_CORE_ANY_COMPVEC and get rid of their own copy.
> 
> Can you please send removal patches together with this API proposal?
> It will ensure that ULPs authors will notice such changes and we won't
> end with special function for one ULP.

I prefer that the maintainers of those ULPs make those changes.
It would require testing that I am not in a position to do myself.

I can add a couple of other ULPs, like cifs and 9p, which look
like straightforward modifications; but my understanding was that
only one user of a new API was required for adoption.

>>> Can we enable this comp_vector for everyone and simplify our API?
>> 
>> We could create a new CQ allocation API that does not take a
>> comp vector. That might be cleaner than passing in a -1.
> 
> +1

I'll send a v2 with this suggestion.

>> But I think some ULPs still want to use the existing API to
>> allocate one CQ for each of a device's comp vectors.
> 
> It can be "legacy implementation", which is not really needed,
> but I don't really know about it.

Have a look at the iSER initiator. There are legitimate use cases
in the kernel for the current ib_alloc_cq() API.

And don't forget the many users of ib_create_cq that remain in
the kernel.

>>>> ---
>>>> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
>>>> include/rdma/ib_verbs.h                  |    3 +++
>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
>>>> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
>>>> 4 files changed, 28 insertions(+), 6 deletions(-)
>>>> 
>>>> Jason-
>>>> 
>>>> If this patch is acceptable to all, then I would expect you to take
>>>> it through the RDMA tree.
>>>> 
>>>> 
>>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>>>> index 7c599878ccf7..a89d549490c4 100644
>>>> --- a/drivers/infiniband/core/cq.c
>>>> +++ b/drivers/infiniband/core/cq.c
>>>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
>>>> 	queue_work(cq->comp_wq, &cq->work);
>>>> }
>>>> 
>>>> +/*
>>>> + * Attempt to spread ULP completion queues over a device's completion
>>>> + * vectors so that all available CPU cores can help service the device's
>>>> + * interrupt workload. This mechanism may be improved at a later point
>>>> + * to dynamically take into account the system's actual workload.
>>>> + */
>>>> +static int ib_get_comp_vector(struct ib_device *dev)
>>>> +{
>>>> +	static atomic_t cv;
>>>> +
>>>> +	if (dev->num_comp_vectors > 1)
>>>> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
>>> 
>>> It is worth to take into account num_online_cpus(),
>> 
>> I don't believe it is.
>> 
>> Håkon has convinced me that assigning interrupt vectors to
>> CPUs is in the domain of user space (ie, driven by policy).
>> In addition, one assumes that taking a CPU offline properly
>> will also involve re-assigning interrupt vectors that point
>> to that core.
>> 
>> In any event, this code can be modified after it is merged
>> if it is necessary to accommodate such requirements.
> 
> It is a simple change, which is worth to do now as long as
> we have interested parties involved here.

Can you propose some code, or point out an example of how you
would prefer it to work?

--
Chuck Lever