On Mon, Jan 20, 2025 at 02:49:22PM -0400, Jason Gunthorpe wrote: > On Sun, Jan 19, 2025 at 10:21:41AM +0200, Leon Romanovsky wrote: > > > Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") > > Cc: stable > > Fixes a user triggerable oops > > - if (!refcount_inc_not_zero(&imr->mmkey.usecount)) > > + xa_lock(&imr->implicit_children); > > + if (__xa_cmpxchg(&imr->implicit_children, idx, mr, NULL, GFP_KERNEL) != > > + mr) { > > + xa_unlock(&imr->implicit_children); > > return; > > + } > > > > - xa_erase(&imr->implicit_children, idx); > > if (MLX5_CAP_ODP(mr_to_mdev(mr)->mdev, mem_page_fault)) > > - xa_erase(&mr_to_mdev(mr)->odp_mkeys, > > - mlx5_base_mkey(mr->mmkey.key)); > > + __xa_erase(&mr_to_mdev(mr)->odp_mkeys, > > + mlx5_base_mkey(mr->mmkey.key)); > > + xa_unlock(&imr->implicit_children); > > + > > + if (!refcount_inc_not_zero(&imr->mmkey.usecount)) > > + return; > > It seems the refcount must be done first: > > /* > * If userspace is racing freeing the parent implicit ODP MR > * then we can loose the race with parent destruction. In this > * case mlx5_ib_free_odp_mr() will free everything in the > * implicit_children xarray so NOP is fine. This child MR > * cannot be destroyed here because we are under its umem_mutex. > */ > if (!refcount_inc_not_zero(&imr->mmkey.usecount)) > return; > > What we must not do is remove something from the xarray and then fail > to free it. Yes, like it was before. > > Jason