On 01-Apr-19 15:19, Leon Romanovsky wrote: > On Mon, Apr 01, 2019 at 02:59:16PM +0300, Gal Pressman wrote: >> On 01-Apr-19 11:47, Leon Romanovsky wrote: >>> From: Leon Romanovsky <leonro@xxxxxxxxxxxx> >>> >>> Hi, >>> >>> This series from Mark provides dynamic statistics infrastructure. >>> He uses netlink interface to configure and retrieve those counters. >>> >>> This infrastructure allows to users monitor various objects by binding >>> to them counters. As the beginning, we used QP object as target for >>> those counters, but future patches will include ODP MR information too. >> >> Hi Leon and Mark, >> Thanks for doing this! >> >>> >>> Two binding modes are supported: >>> - Auto: This allows a user to build automatic set of objects to a counter >> >> build = bind? > > "build" == "chain". In theory, user will be able to create very complex > filters, send those chains and kernel will handle it. > For example, bind counters for UD QP, on specific port and for new > processes all together. > >> >>> according to common criteria. For example in a per-type scheme, where in >>> one process all QPs with same QP type are bound automatically to a single >>> counter. >> >> How do we decide which criteria is suitable for auto mode and why is it better >> than letting the userspace handle it by itself (query all QPs and bind certain >> types to "manual" counters). >> Seems like doing it in userspace provides more flexibility than a fixed set of >> kernel auto types. > > "Auto mode" allows to get counters during object creation, for example > in ODP MR case, it will give us a chance to count pagefaults immediately > after MRs are created. It is good from system perspective too, he will > need to configure policy only once during boot and it will simply work. I understand the motivation but this mode requires special handling for each case of "auto mode type", and I'm not sure I understand what qualifies for having its own type. This example uses the QP type, can I push a patch to auto bind all QPs that have 2 max_recv_sge? > >> >> Is there a reason to have one auto counter per port? >> Theoretically I can allocate two auto counters and assign a different auto mask >> to each one. > > Sometimes you need to say enough is enough :), we didn't want to add so > much complexity without solid use case justification. > > From implementation perspective, we will be able to do it later, because > it won't require any change in kernel API. Just need to ensure that such > masks are returned with dumpit. > >> >>> - Manual: This allows a user to manually bind objects on a counter. >>> >>> Those two modes are mutual-exclusive with separation between processes, >>> objects created by different processes cannot be bound to a same counter. >>> >>> For objects which don't support counter binding, we will return >>> pre-allocated counters. >> >> Can you explain? What are those objects and what are pre allocated counters? > > For example MR counters, we thought to add very simple set of them and > make always available. Can you please add an example output of what you have in mind? Which userspace command triggers these pre allocated-counters? > >> >>> >>> $ rdma statistic qp set link mlx5_2/1 auto type on >>> $ rdma statistic qp set link mlx5_2/1 auto off >>> $ rdma statistic qp bind link mlx5_2/1 lqpn 178 >>> $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178 >>> $ rdma statistic show >>> $ rdma statistic qp mode >>> >>> Thanks >>> >>> Mark Zhang (16): >>> net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap >>> RDMA/restrack: Introduce statistic counter >>> RDMA/restrack: Add an API to attach a task to a resource >>> RDMA/restrack: Make is_visible_in_pid_ns() as an API >>> RDMA/counter: Add set/clear per-port auto mode support >>> RDMA/counter: Add "auto" configuration mode support >>> IB/mlx5: Support set qp counter >>> IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters() >>> IB/mlx5: Support statistic q counter configuration >>> RDMA/nldev: Allow counter auto mode configuration through RDMA netlink >>> RDMA/netlink: Implement counter dumpit callback >>> IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support >>> RDMA/core: Get sum value of all counters when perform a sysfs stat read >>> RDMA/counter: Allow manual mode configuration support >>> RDMA/nldev: Allow counter manual mode configuration through RDMA netlink >>> RDMA/nldev: Allow get counter mode through RDMA netlink >>> >>> drivers/infiniband/core/Makefile | 2 +- >>> drivers/infiniband/core/counters.c | 652 +++++++++++++++++++++++++++ >>> drivers/infiniband/core/device.c | 14 + >>> drivers/infiniband/core/nldev.c | 427 +++++++++++++++++- >>> drivers/infiniband/core/restrack.c | 49 +- >>> drivers/infiniband/core/restrack.h | 3 + >>> drivers/infiniband/core/sysfs.c | 10 +- >>> drivers/infiniband/core/verbs.c | 9 + >>> drivers/infiniband/hw/mlx5/main.c | 88 +++- >>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 + >>> drivers/infiniband/hw/mlx5/qp.c | 76 +++- >>> include/linux/mlx5/mlx5_ifc.h | 4 +- >>> include/linux/mlx5/qp.h | 1 + >>> include/rdma/ib_verbs.h | 32 ++ >>> include/rdma/rdma_counter.h | 64 +++ >>> include/rdma/restrack.h | 4 + >>> include/uapi/rdma/rdma_netlink.h | 52 ++- >>> 17 files changed, 1462 insertions(+), 31 deletions(-) >>> create mode 100644 drivers/infiniband/core/counters.c >>> create mode 100644 include/rdma/rdma_counter.h >>> >>> -- >>> 2.20.1 >>> >>