> On Jul 25, 2019, at 9:17 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Wed, Jul 24, 2019 at 10:01:36AM -0400, Chuck Lever wrote: >> Hi Leon, thanks for taking a look. Responses below. >> >> >>> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: >>> >>> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote: >>>> Send and Receive completion is handled on a single CPU selected at >>>> the time each Completion Queue is allocated. Typically this is when >>>> an initiator instantiates an RDMA transport, or when a target >>>> accepts an RDMA connection. >>>> >>>> Some ULPs cannot open a connection per CPU to spread completion >>>> workload across available CPUs. For these ULPs, allow the RDMA core >>>> to select a completion vector based on the device's complement of >>>> available comp_vecs. >>>> >>>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are >>>> available, a different CPU will be selected for each Completion >>>> Queue. For the moment, a simple round-robin mechanism is used. >>>> >>>> Suggested-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>> >>> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq? >>> From what I see, or callers are internally implementing similar logic >>> to proposed here, or they don't care (set 0). >> >> The goal of this patch is to deduplicate that "similar logic". >> Callers that implement this logic already can use >> RDMA_CORE_ANY_COMPVEC and get rid of their own copy. > > Can you please send removal patches together with this API proposal? > It will ensure that ULPs authors will notice such changes and we won't > end with special function for one ULP. I prefer that the maintainers of those ULPs make those changes. It would require testing that I am not in a position to do myself. I can add a couple of other ULPs, like cifs and 9p, which look like straightforward modifications; but my understanding was that only one user of a new API was required for adoption. >>> Can we enable this comp_vector for everyone and simplify our API? >> >> We could create a new CQ allocation API that does not take a >> comp vector. That might be cleaner than passing in a -1. > > +1 I'll send a v2 with this suggestion. >> But I think some ULPs still want to use the existing API to >> allocate one CQ for each of a device's comp vectors. > > It can be "legacy implementation", which is not really needed, > but I don't really know about it. Have a look at the iSER initiator. There are legitimate use cases in the kernel for the current ib_alloc_cq() API. And don't forget the many users of ib_create_cq that remain in the kernel. >>>> --- >>>> drivers/infiniband/core/cq.c | 20 +++++++++++++++++++- >>>> include/rdma/ib_verbs.h | 3 +++ >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 6 ++++-- >>>> net/sunrpc/xprtrdma/verbs.c | 5 ++--- >>>> 4 files changed, 28 insertions(+), 6 deletions(-) >>>> >>>> Jason- >>>> >>>> If this patch is acceptable to all, then I would expect you to take >>>> it through the RDMA tree. >>>> >>>> >>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c >>>> index 7c599878ccf7..a89d549490c4 100644 >>>> --- a/drivers/infiniband/core/cq.c >>>> +++ b/drivers/infiniband/core/cq.c >>>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private) >>>> queue_work(cq->comp_wq, &cq->work); >>>> } >>>> >>>> +/* >>>> + * Attempt to spread ULP completion queues over a device's completion >>>> + * vectors so that all available CPU cores can help service the device's >>>> + * interrupt workload. This mechanism may be improved at a later point >>>> + * to dynamically take into account the system's actual workload. >>>> + */ >>>> +static int ib_get_comp_vector(struct ib_device *dev) >>>> +{ >>>> + static atomic_t cv; >>>> + >>>> + if (dev->num_comp_vectors > 1) >>>> + return atomic_inc_return(&cv) % dev->num_comp_vectors; >>> >>> It is worth to take into account num_online_cpus(), >> >> I don't believe it is. >> >> Håkon has convinced me that assigning interrupt vectors to >> CPUs is in the domain of user space (ie, driven by policy). >> In addition, one assumes that taking a CPU offline properly >> will also involve re-assigning interrupt vectors that point >> to that core. >> >> In any event, this code can be modified after it is merged >> if it is necessary to accommodate such requirements. > > It is a simple change, which is worth to do now as long as > we have interested parties involved here. Can you propose some code, or point out an example of how you would prefer it to work? -- Chuck Lever