On Tue, Jan 22, 2019 at 09:16:11AM +0200, Leon Romanovsky wrote: > From: Moni Shoua <monis@xxxxxxxxxxxx> > > The variable 'dying' in struct ib_umem_odp is used 2 flows, not in an atomic > manner by the following scheme > > 1. read > 2. if zero proceed > > This exposes a race between the flows that can end up with 2 increments > of num_leaf_free but only one decrement that prevent the implicit mr > structure from being destroyed. > > This call trace is a result of such a race > > INFO: task leakcatch_ibm.o:42313 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > leakcatch_ibm.o D 00001000004382e4 0 42313 42297 0x00040080 > Call Trace: > [c000001e3b107870] [c000001e3b1078d0] 0xc000001e3b1078d0 (unreliable) > [c000001e3b107a40] [c0000000000168dc] switch_to+0x25c/0x470 > [c000001e3b107aa0] [c000000000996854] __schedule+0x414/0xad0 > [c000001e3b107b80] [d000000023f278c4] mlx5_ib_free_implicit_mr+0xb4/0x110 [mlx5_ib] > [c000001e3b107bf0] [d000000023f1c5c8] mlx5_ib_invalidate_mr+0x178/0x1e0 [mlx5_ib] > [c000001e3b107c30] [d000000023f1c7d4] mlx5_ib_dereg_mr+0xe4/0x110 [mlx5_ib] > [c000001e3b107c70] [d000000023b414c4] ib_dereg_mr+0x34/0x70 [ib_core] > [c000001e3b107ca0] [d000000027de1fbc] ib_uverbs_cleanup_ucontext+0x4bc/0x6c0 [ib_uverbs] > [c000001e3b107d20] [d000000027de2204] ib_uverbs_close+0x44/0x140 [ib_uverbs] > [c000001e3b107d50] [c0000000003219e4] __fput+0xd4/0x2d0 > [c000001e3b107db0] [c00000000010f6d4] task_work_run+0xe4/0x160 > [c000001e3b107e00] [c000000000018c6c] do_notify_resume+0xcc/0x100 > [c000001e3b107e30] [c00000000000a7b0] ret_from_except_lite+0x5c/0x60 > > Fix by making steps 1 and 2 above done in an atomic way. > > Fixes: 882214e2b128 ("IB/core: Implement support for MMU notifiers regarding on demand paging regions") > Signed-off-by: Moni Shoua <monis@xxxxxxxxxxxx> > Reviewed-by: Artemy Kovalyov <artemyko@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > drivers/infiniband/core/umem_odp.c | 2 +- > drivers/infiniband/hw/mlx5/odp.c | 10 ++++------ > include/rdma/ib_umem_odp.h | 2 +- > 3 files changed, 6 insertions(+), 8 deletions(-) > > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c > index 0d154c5d12cc..679146352a3c 100644 > +++ b/drivers/infiniband/core/umem_odp.c > @@ -113,7 +113,7 @@ static int ib_umem_notifier_release_trampoline(struct ib_umem_odp *umem_odp, > * prevent any further fault handling on this MR. > */ > ib_umem_notifier_start_account(umem_odp); > - umem_odp->dying = 1; > + atomic_set(&umem_odp->dying, 1); > /* Make sure that the fact the umem is dying is out before we release > * all pending page faults. */ > smp_wmb(); Is the smp_wmb still needed if there is an atomic, or it should it be one of the special atomic mbs? Or you don't need need a barrier at all if the test/set bit API is used instead. > diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c > index fa22b28e8e43..c1d258b86145 100644 > +++ b/drivers/infiniband/hw/mlx5/odp.c > @@ -98,7 +98,7 @@ static int check_parent(struct ib_umem_odp *odp, > { > struct mlx5_ib_mr *mr = odp->private; > > - return mr && mr->parent == parent && !odp->dying; > + return mr && mr->parent == parent && !atomic_read(&odp->dying); > } Almost every time I see code trying to create a 'is_closed/dying' construct without locks it turns out to be racy and wrong. Why is it OK for dying to become 1 the instant after check_parent is called? If it is OK then why are we testing dying here? > @@ -551,10 +550,9 @@ static int mr_leaf_free(struct ib_umem_odp *umem_odp, u64 start, u64 end, > ib_umem_odp_unmap_dma_pages(umem_odp, ib_umem_start(umem), > ib_umem_end(umem)); > > - if (umem_odp->dying) > + if (!atomic_add_unless(&umem_odp->dying, 1, 1)) > return 0; Why use an atomic? Isn't this algorithm just test_and_set_bit() / set_bit() and test_bit() ? They are atomics too. > @@ -682,7 +680,7 @@ static int pagefault_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr, > > out: > if (ret == -EAGAIN) { > - if (implicit || !odp->dying) { > + if (implicit || !atomic_read(&odp->dying)) { > unsigned long timeout = > msecs_to_jiffies(MMU_NOTIFIER_TIMEOUT); Same question as check_parent Jason