Re: [PATCH 2/4] RDMA/core: Introduce shared CQ pool API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 12, 2020 at 10:00:59AM +0300, Yamin Friedman wrote:
>
> On 5/11/2020 3:08 PM, Yamin Friedman wrote:
> >
> > On 5/11/2020 8:07 AM, Leon Romanovsky wrote:
> > > On Sun, May 10, 2020 at 05:55:55PM +0300, Yamin Friedman wrote:
> > > > Allow a ULP to ask the core to provide a completion queue based on a
> > > > least-used search on a per-device CQ pools. The device CQ pools
> > > > grow in a
> > > > lazy fashion when more CQs are requested.
> > > >
> > > > This feature reduces the amount of interrupts when using many QPs.
> > > > Using shared CQs allows for more effcient completion handling. It also
> > > > reduces the amount of overhead needed for CQ contexts.
> > > >
> > > > Test setup:
> > > > Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers.
> > > > Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch.
> > > > TX-depth = 32. Number of cores refers to the initiator side.
> > > > Four disks are
> > > > accessed from each core. In the current case we have four CQs
> > > > per core and
> > > > in the shared case we have a single CQ per core. Until 14 cores
> > > > there is no
> > > > significant change in performance and the number of interrupts
> > > > per second
> > > > is less than a million in the current case.
> > > > ==================================================
> > > > |Cores|Current KIOPs  |Shared KIOPs  |improvement|
> > > > |-----|---------------|--------------|-----------|
> > > > |14   |2188           |2620          |19.7%      |
> > > > |-----|---------------|--------------|-----------|
> > > > |20   |2063           |2308          |11.8%      |
> > > > |-----|---------------|--------------|-----------|
> > > > |28   |1933           |2235          |15.6%      |
> > > > |=================================================
> > > > |Cores|Current avg lat|Shared avg lat|improvement|
> > > > |-----|---------------|--------------|-----------|
> > > > |14   |817us          |683us         |16.4%      |
> > > > |-----|---------------|--------------|-----------|
> > > > |20   |1239us         |1108us        |10.6%      |
> > > > |-----|---------------|--------------|-----------|
> > > > |28   |1852us         |1601us        |13.5%      |
> > > > ========================================================
> > > > |Cores|Current interrupts|Shared interrupts|improvement|
> > > > |-----|------------------|-----------------|-----------|
> > > > |14   |2131K/sec         |425K/sec         |80%        |
> > > > |-----|------------------|-----------------|-----------|
> > > > |20   |2267K/sec         |594K/sec         |73.8%      |
> > > > |-----|------------------|-----------------|-----------|
> > > > |28   |2370K/sec         |1057K/sec        |55.3%      |
> > > > ====================================================================
> > > > |Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement|
> > > > |-----|------------------------|-----------------------|-----------|
> > > > |14   |85Kus                   |9Kus |88%        |
> > > > |-----|------------------------|-----------------------|-----------|
> > > > |20   |6Kus                    |5.3Kus |14.6%      |
> > > > |-----|------------------------|-----------------------|-----------|
> > > > |28   |11.6Kus                 |9.5Kus |18%        |
> > > > |===================================================================
> > > >
> > > > Performance improvement with 16 disks (16 CQs per core) is comparable.
> > > >
> > > > Signed-off-by: Yamin Friedman <yaminf@xxxxxxxxxxxx>
> > > > Reviewed-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
> > > > ---
> > > >   drivers/infiniband/core/core_priv.h |   8 ++
> > > >   drivers/infiniband/core/cq.c        | 145
> > > > ++++++++++++++++++++++++++++++++++++
> > > >   drivers/infiniband/core/device.c    |   3 +-
> > > >   include/rdma/ib_verbs.h             |  32 ++++++++
> > > >   4 files changed, 187 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/infiniband/core/core_priv.h
> > > > b/drivers/infiniband/core/core_priv.h
> > > > index cf42acc..7fe9c13 100644
> > > > --- a/drivers/infiniband/core/core_priv.h
> > > > +++ b/drivers/infiniband/core/core_priv.h
> > > > @@ -191,6 +191,14 @@ static inline bool
> > > > rdma_is_upper_dev_rcu(struct net_device *dev,
> > > >       return netdev_has_upper_dev_all_rcu(dev, upper);
> > > >   }
> > > >
> > > > +struct ib_cq *ib_cq_pool_get(struct ib_device *dev, unsigned
> > > > int nr_cqe,
> > > > +                 int cpu_hint, enum ib_poll_context poll_ctx);
> > > > +void ib_cq_pool_put(struct ib_cq *cq, unsigned int nr_cqe);
> > > > +
> > > > +void ib_init_cq_pools(struct ib_device *dev);
> > > > +
> > > > +void ib_purge_cq_pools(struct ib_device *dev);
> > > I don't know how next patches compile to you, but "core_priv.h" is wrong
> > > place to put function declarations. You also put them here and in
> > > ib_verbs.h
> > > below.
> > >
> > > Also, it will be nice if you will use same naming convention like in
> > > mr_pool.h
> > I will remove the use of core_priv and look into refactoring the names.
> Init_cq_pools and purge_cq_pools are not exported functions they are for
> internal core use, is ib_verbs really the place for them?

I don't know, probably better to group them together because you
will need to put ib_cq_pool_put/ib_cq_pool_get in the ib_verbs.h anyway.

Thanks



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux