On Thu, Feb 27, 2020 at 01:39:18PM +0200, Leon Romanovsky wrote: > From: Artemy Kovalyov <artemyko@xxxxxxxxxxxx> > > Following race may occur because of the call_srcu and the placement of > the synchronize_srcu vs the xa_erase. > > CPU0 CPU1 > > mlx5_ib_free_implicit_mr: destroy_unused_implicit_child_mr: > xa_erase(odp_mkeys) > synchronize_srcu() > xa_lock(implicit_children) > if (still in xarray) > atomic_inc() > call_srcu() > xa_unlock(implicit_children) > xa_erase(implicit_children): > xa_lock(implicit_children) > __xa_erase() > xa_unlock(implicit_children) > > flush_workqueue() > [..] > free_implicit_child_mr_rcu: > (via call_srcu) > queue_work() > > WARN_ON(atomic_read()) > [..] > free_implicit_child_mr_work: > (via wq) > free_implicit_child_mr() > mlx5_mr_cache_invalidate() > mlx5_ib_update_xlt() <-- UMR QP fail > atomic_dec() > > The wait_event() solves the race because it blocks until > free_implicit_child_mr_work() completes. > > Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") > Signed-off-by: Artemy Kovalyov <artemyko@xxxxxxxxxxxx> > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > --- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + > drivers/infiniband/hw/mlx5/odp.c | 17 +++++++---------- > 2 files changed, 8 insertions(+), 10 deletions(-) Applied to for-rc, thanks Jason