[PATCH rdma-next 4/4] IB/mlx5: Test and increment 'dying' atomically

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Moni Shoua <monis@xxxxxxxxxxxx>

The variable 'dying' in struct ib_umem_odp is used 2 flows, not in an atomic
manner by the following scheme

1. read
2. if zero proceed

This exposes a race between the flows that can end up with 2 increments
of num_leaf_free but only one decrement that prevent the implicit mr
structure from being destroyed.

This call trace is a result of such a race

 INFO: task leakcatch_ibm.o:42313 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 leakcatch_ibm.o D 00001000004382e4 0 42313 42297 0x00040080
 Call Trace:
 [c000001e3b107870] [c000001e3b1078d0] 0xc000001e3b1078d0 (unreliable)
 [c000001e3b107a40] [c0000000000168dc] switch_to+0x25c/0x470
 [c000001e3b107aa0] [c000000000996854] __schedule+0x414/0xad0
 [c000001e3b107b80] [d000000023f278c4] mlx5_ib_free_implicit_mr+0xb4/0x110 [mlx5_ib]
 [c000001e3b107bf0] [d000000023f1c5c8] mlx5_ib_invalidate_mr+0x178/0x1e0 [mlx5_ib]
 [c000001e3b107c30] [d000000023f1c7d4] mlx5_ib_dereg_mr+0xe4/0x110 [mlx5_ib]
 [c000001e3b107c70] [d000000023b414c4] ib_dereg_mr+0x34/0x70 [ib_core]
 [c000001e3b107ca0] [d000000027de1fbc] ib_uverbs_cleanup_ucontext+0x4bc/0x6c0 [ib_uverbs]
 [c000001e3b107d20] [d000000027de2204] ib_uverbs_close+0x44/0x140 [ib_uverbs]
 [c000001e3b107d50] [c0000000003219e4] __fput+0xd4/0x2d0
 [c000001e3b107db0] [c00000000010f6d4] task_work_run+0xe4/0x160
 [c000001e3b107e00] [c000000000018c6c] do_notify_resume+0xcc/0x100
 [c000001e3b107e30] [c00000000000a7b0] ret_from_except_lite+0x5c/0x60

Fix by making steps 1 and 2 above done in an atomic way.

Fixes: 882214e2b128 ("IB/core: Implement support for MMU notifiers regarding on demand paging regions")
Signed-off-by: Moni Shoua <monis@xxxxxxxxxxxx>
Reviewed-by: Artemy Kovalyov <artemyko@xxxxxxxxxxxx>
Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
---
 drivers/infiniband/core/umem_odp.c |  2 +-
 drivers/infiniband/hw/mlx5/odp.c   | 10 ++++------
 include/rdma/ib_umem_odp.h         |  2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 0d154c5d12cc..679146352a3c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -113,7 +113,7 @@ static int ib_umem_notifier_release_trampoline(struct ib_umem_odp *umem_odp,
 	 * prevent any further fault handling on this MR.
 	 */
 	ib_umem_notifier_start_account(umem_odp);
-	umem_odp->dying = 1;
+	atomic_set(&umem_odp->dying, 1);
 	/* Make sure that the fact the umem is dying is out before we release
 	 * all pending page faults. */
 	smp_wmb();
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index fa22b28e8e43..c1d258b86145 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -98,7 +98,7 @@ static int check_parent(struct ib_umem_odp *odp,
 {
 	struct mlx5_ib_mr *mr = odp->private;

-	return mr && mr->parent == parent && !odp->dying;
+	return mr && mr->parent == parent && !atomic_read(&odp->dying);
 }

 struct ib_ucontext_per_mm *mr_to_per_mm(struct mlx5_ib_mr *mr)
@@ -289,8 +289,7 @@ void mlx5_ib_invalidate_range(struct ib_umem_odp *umem_odp, unsigned long start,
 	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);

 	if (unlikely(!umem->npages && mr->parent &&
-		     !umem_odp->dying)) {
-		WRITE_ONCE(umem_odp->dying, 1);
+	    atomic_add_unless(&umem_odp->dying, 1, 1))) {
 		atomic_inc(&mr->parent->num_leaf_free);
 		schedule_work(&umem_odp->work);
 	}
@@ -551,10 +550,9 @@ static int mr_leaf_free(struct ib_umem_odp *umem_odp, u64 start, u64 end,
 	ib_umem_odp_unmap_dma_pages(umem_odp, ib_umem_start(umem),
 				    ib_umem_end(umem));

-	if (umem_odp->dying)
+	if (!atomic_add_unless(&umem_odp->dying, 1, 1))
 		return 0;

-	WRITE_ONCE(umem_odp->dying, 1);
 	atomic_inc(&imr->num_leaf_free);
 	schedule_work(&umem_odp->work);

@@ -682,7 +680,7 @@ static int pagefault_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,

 out:
 	if (ret == -EAGAIN) {
-		if (implicit || !odp->dying) {
+		if (implicit || !atomic_read(&odp->dying)) {
 			unsigned long timeout =
 				msecs_to_jiffies(MMU_NOTIFIER_TIMEOUT);

diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index dadc96dea39c..eb129b56322c 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -74,7 +74,7 @@ struct ib_umem_odp {
 	struct umem_odp_node	interval_tree;

 	struct completion	notifier_completion;
-	int			dying;
+	atomic_t		dying;
 	struct work_struct	work;
 };

--
2.19.1




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux