On Sun, Oct 06, 2019 at 06:51:36PM +0300, Leon Romanovsky wrote: > From: Erez Alfasi <ereza@xxxxxxxxxxxx> > > Introduce ODP diagnostic counters and count the following > per MR within IB/mlx5 driver: > 1) Page faults: > Total number of faulted pages. > 2) Page invalidations: > Total number of pages invalidated by the OS during all > invalidation events. The translations can be no longer > valid due to either non-present pages or mapping changes. > 3) Prefetched pages: > When prefetching a page, page fault is generated > in order to bring the page to the main memory. > The prefetched pages counter will be updated > during a page fault flow only if it was derived > from prefetching operation. > > Signed-off-by: Erez Alfasi <ereza@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++++ > drivers/infiniband/hw/mlx5/odp.c | 18 ++++++++++++++++++ > include/rdma/ib_verbs.h | 6 ++++++ > 3 files changed, 28 insertions(+) > > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h > index bf30d53d94dc..5aae05ebf64b 100644 > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h > @@ -585,6 +585,9 @@ struct mlx5_ib_dm { > IB_ACCESS_REMOTE_READ |\ > IB_ZERO_BASED) > > +#define mlx5_update_odp_stats(mr, counter_name, value) \ > + atomic64_add(value, &((mr)->odp_stats.counter_name)) > + > struct mlx5_ib_mr { > struct ib_mr ibmr; > void *descs; > @@ -622,6 +625,7 @@ struct mlx5_ib_mr { > wait_queue_head_t q_leaf_free; > struct mlx5_async_work cb_work; > atomic_t num_pending_prefetch; > + struct ib_odp_counters odp_stats; > }; > > static inline bool is_odp_mr(struct mlx5_ib_mr *mr) > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c > index 95cf0249b015..966783bfb557 100644 > +++ b/drivers/infiniband/hw/mlx5/odp.c > @@ -261,6 +261,10 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start, > blk_start_idx = idx; > in_block = 1; > } > + > + /* Count page invalidations */ > + mlx5_update_odp_stats(mr, invalidations, > + (idx - blk_start_idx + 1)); I feel like these should be batched and the atomic done once at the end of the routine.. > } else { > u64 umr_offset = idx & umr_block_mask; > > @@ -287,6 +291,7 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start, > > ib_umem_odp_unmap_dma_pages(umem_odp, start, end); > > + > if (unlikely(!umem_odp->npages && mr->parent && > !umem_odp->dying)) { > WRITE_ONCE(umem_odp->dying, 1); > @@ -801,6 +806,19 @@ static int pagefault_single_data_segment(struct mlx5_ib_dev *dev, > if (ret < 0) > goto srcu_unlock; > > + /* > + * When prefetching a page, page fault is generated > + * in order to bring the page to the main memory. > + * In the current flow, page faults are being counted. > + * Prefetched pages counter will be updated as well > + * only if the current page fault flow was derived > + * from prefetching flow. > + */ > + mlx5_update_odp_stats(mr, faults, ret); > + > + if (prefetch) > + mlx5_update_odp_stats(mr, prefetched, ret); Hm, I'm about to post a series that eliminates 'prefetch' here.. This is also not quite right for prefetch as we are doing a form of prefetching in the mlx5_ib_mr_rdma_pfault_handler() too, although it is less clear how to count those. Maybe this should be split to SQ/RQ faults? Jason