On Mon, Sep 09, 2019 at 08:51:50AM +0000, Jason Gunthorpe wrote: > On Fri, Aug 30, 2019 at 11:16:11AM +0300, Leon Romanovsky wrote: > > From: Erez Alfasi <ereza@xxxxxxxxxxxx> > > > > Add RDMA nldev netlink interface for dumping MR > > statistics information. > > > > Output example: > > ereza@dev~$: ./ibv_rc_pingpong -o -P -s 500000000 > > local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: > > > > ereza@dev~$: rdma stat show mr > > dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 > > prefetched_pages 122071 > > > > Signed-off-by: Erez Alfasi <ereza@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > drivers/infiniband/core/device.c | 1 + > > drivers/infiniband/core/nldev.c | 54 +++++++++++++++++++++++++++++-- > > drivers/infiniband/hw/mlx5/main.c | 16 +++++++++ > > include/rdma/ib_verbs.h | 9 ++++++ > > 4 files changed, 78 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > > index 99c4a55545cf..34a9e37c5c61 100644 > > +++ b/drivers/infiniband/core/device.c > > @@ -2610,6 +2610,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) > > SET_DEVICE_OP(dev_ops, get_dma_mr); > > SET_DEVICE_OP(dev_ops, get_hw_stats); > > SET_DEVICE_OP(dev_ops, get_link_layer); > > + SET_DEVICE_OP(dev_ops, fill_odp_stats); > > SET_DEVICE_OP(dev_ops, get_netdev); > > SET_DEVICE_OP(dev_ops, get_port_immutable); > > SET_DEVICE_OP(dev_ops, get_vector_affinity); > > I'm now curious what motivated placing the line here, apparently > randomly in a sorted list? desire to be different and express yourself? > > > +static int fill_stat_mr_entry(struct sk_buff *msg, bool has_cap_net_admin, > > + struct rdma_restrack_entry *res, uint32_t port) > > +{ > > + struct ib_mr *mr = container_of(res, struct ib_mr, res); > > + struct ib_device *dev = mr->pd->device; > > + struct ib_odp_counters odp_stats; > > + struct nlattr *table_attr; > > + > > + if (nla_put_u32(msg, RDMA_NLDEV_ATTR_RES_MRN, res->id)) > > + goto err; > > + > > + if (!dev->ops.fill_odp_stats) > > + return 0; > > + > > + if (!dev->ops.fill_odp_stats(mr, &odp_stats)) > > + return 0; > > As Parav says this seems to be wrong for !ODP MRs. Can we even detect > !ODP at this point? ODP is decided on UMEM level which you said should be driver property and it means that driver should distinguish between odp/not-odp. > > > + > > + table_attr = nla_nest_start(msg, > > + RDMA_NLDEV_ATTR_STAT_HWCOUNTERS); > > + > > + if (!table_attr) > > + return -EMSGSIZE; > > + > > + if (fill_stat_hwcounter_entry(msg, "page_faults", > > + (u64)atomic64_read(&odp_stats.faults))) > > Why the cast? atomic64_read returns s64 and not u64, I didn't see need to investigate if s64 == u64 in all architectures. > > > > +static bool mlx5_ib_fill_odp_stats(struct ib_mr *ibmr, > > + struct ib_odp_counters *cnt) > > +{ > > + struct mlx5_ib_mr *mr = to_mmr(ibmr); > > + > > + if (!is_odp_mr(mr)) > > + return false; > > + > > + memcpy(cnt, &to_ib_umem_odp(mr->umem)->odp_stats, > > + sizeof(struct ib_odp_counters)); > > Can't memcpy atomic64, have to open code a copy using atomic64_read. Right > > > @@ -6316,6 +6331,7 @@ static const struct ib_device_ops mlx5_ib_dev_ops = { > > .get_dev_fw_str = get_dev_fw_str, > > .get_dma_mr = mlx5_ib_get_dma_mr, > > .get_link_layer = mlx5_ib_port_link_layer, > > + .fill_odp_stats = mlx5_ib_fill_odp_stats, > > Randomly again.. > > Jason