Re: [bug report] blktests srp/002 hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2023/9/20 2:11, Bob Pearson 写道:
On 9/19/23 03:07, Zhu Yanjun wrote:
在 2023/9/19 12:14, Shinichiro Kawasaki 写道:
On Sep 16, 2023 / 13:59, Zhu Yanjun wrote:
[...]
On Debian, with the latest multipathd or revert the commit 9b4b7c1f9f54
("RDMA/rxe: Add workqueue support for rxe tasks"), this problem will
disappear.
Zhu, thank you for the actions.

On Fedora 38, if the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support
for rxe tasks") is reverted, will this problem still appear?
I do not have such test environment. The commit is in the attachment,
can anyone have a test? Please let us know the test result. Thanks.
I tried the latest kernel tag v6.6-rc2 with my Fedora 38 test systems. With the
v6.6-rc2 kernel, I still see the hang. I repeated the blktests test case srp/002
30 time or so, then the hang was recreated. Then I reverted the commit
9b4b7c1f9f54 from v6.6-rc2, and the hang disappeared. I repeated the blktests
test case 100 times, and did not see the hang.

I confirmed these results under two multipathd conditions: 1) with Fedora latest
device-mapper-multipath package v0.9.4, and 2) the latest multipath-tools v0.9.6
that I built from source code.

So, when the commit gets reverted, the hang disappears as I reported for
v6.5-rcX kernels.
Thanks, Shinichiro Kawasaki. Your helps are appreciated.

This problem is related with the followings:

1). Linux distributions: Ubuntu, Debian and Fedora;

2). multipathd;

3). the commits 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks")

On Ubuntu, with or without the commit, this problem does not occur.

On Debian, without this commit, this problem does not occur. With this commit, this problem will occur.

On Fedora, without this commit, this problem does not occur. With this commit, this problem will occur.

The commits 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") is from Bob Pearson.

Hi, Bob, do you have any comments about this problem? It seems that this commit is not compatible with blktests.

Hi, Jason and Leon, please comment on this problem.

Thanks a lot.

Zhu Yanjun
My belief is that the issue is related to timing not the logical operation of the code.
Work queues are just kernel processes and can be scheduled (if not holding spinlocks)
while soft IRQs lock up the CPU until they exit. This can cause longer delays in responding
to ULPs. The work queue tasks for each QP are strictly single threaded which is managed by
the work queue framework the same as tasklets.

Thanks, Bob. From you, the workqueue can be scheduled, this can cause longer delays in reponding to ULPs.

This will cause ULPs to hang. But the tasklet will lock up the CPU until it exits. So the tasklet will repond to

ULPs in time.

To this, there are 3 solutins:

1). Try to make workqueue respond ULPs in time, this hang problem should be avoided. so this will not cause

this problem. But from the kernel, workqueue should be scheduled,So it is difficult to avoid this longer delay.


2). Make tasklet and workqueue both work in RXE.  We can make one of tasklet or workqueue as the default. The user

can choose to use tasklet or workqueue via kernel module parameter or sysctl variables. This will cost a lot of time

and efforts to implement it.


3). Revert the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks"). Shinichiro Kawasaki

confirmed that this can fix this regression. And the patch is in the attachment.


Hi, Bob, Please comment.

Hi, Jason && Leon, please also comment on this.

Thanks a lot.


Earlier in time I have also seen the exact same hang behavior with the siw driver but not
recently. Also I have seen sensitivity to logging changes in the hang behavior. These are

This is a regression to RXE which is caused by the the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks").

We should fix it.

Zhu Yanjun

indications that timing may be the cause of the issue.

Bob
From fd2360edbc9171298d2e91fd9b74b4c3022db9d4 Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Date: Fri, 15 Sep 2023 23:07:17 -0400
Subject: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

This reverts commit 9b4b7c1f9f54120940e243251e2b1407767b3381.

Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
---
 drivers/infiniband/sw/rxe/rxe.c      |   9 +--
 drivers/infiniband/sw/rxe/rxe_task.c | 110 ++++++++++++---------------
 drivers/infiniband/sw/rxe/rxe_task.h |   6 +-
 3 files changed, 49 insertions(+), 76 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 54c723a6edda..7a7e713de52d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -212,15 +212,9 @@ static int __init rxe_module_init(void)
 {
 	int err;
 
-	err = rxe_alloc_wq();
-	if (err)
-		return err;
-
 	err = rxe_net_init();
-	if (err) {
-		rxe_destroy_wq();
+	if (err)
 		return err;
-	}
 
 	rdma_link_register(&rxe_link_ops);
 	pr_info("loaded\n");
@@ -232,7 +226,6 @@ static void __exit rxe_module_exit(void)
 	rdma_link_unregister(&rxe_link_ops);
 	ib_unregister_driver(RDMA_DRIVER_RXE);
 	rxe_net_exit();
-	rxe_destroy_wq();
 
 	pr_info("unloaded\n");
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 1501120d4f52..fb9a6bc8e620 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -6,24 +6,8 @@
 
 #include "rxe.h"
 
-static struct workqueue_struct *rxe_wq;
-
-int rxe_alloc_wq(void)
-{
-	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
-	if (!rxe_wq)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void rxe_destroy_wq(void)
-{
-	destroy_workqueue(rxe_wq);
-}
-
 /* Check if task is idle i.e. not running, not scheduled in
- * work queue and not draining. If so move to busy to
+ * tasklet queue and not draining. If so move to busy to
  * reserve a slot in do_task() by setting to busy and taking
  * a qp reference to cover the gap from now until the task finishes.
  * state will move out of busy if task returns a non zero value
@@ -37,6 +21,9 @@ static bool __reserve_if_idle(struct rxe_task *task)
 {
 	WARN_ON(rxe_read(task->qp) <= 0);
 
+	if (task->tasklet.state & BIT(TASKLET_STATE_SCHED))
+		return false;
+
 	if (task->state == TASK_STATE_IDLE) {
 		rxe_get(task->qp);
 		task->state = TASK_STATE_BUSY;
@@ -51,7 +38,7 @@ static bool __reserve_if_idle(struct rxe_task *task)
 }
 
 /* check if task is idle or drained and not currently
- * scheduled in the work queue. This routine is
+ * scheduled in the tasklet queue. This routine is
  * called by rxe_cleanup_task or rxe_disable_task to
  * see if the queue is empty.
  * Context: caller should hold task->lock.
@@ -59,7 +46,7 @@ static bool __reserve_if_idle(struct rxe_task *task)
  */
 static bool __is_done(struct rxe_task *task)
 {
-	if (work_pending(&task->work))
+	if (task->tasklet.state & BIT(TASKLET_STATE_SCHED))
 		return false;
 
 	if (task->state == TASK_STATE_IDLE ||
@@ -90,23 +77,23 @@ static bool is_done(struct rxe_task *task)
  * schedules the task. They must call __reserve_if_idle to
  * move the task to busy before calling or scheduling.
  * The task can also be moved to drained or invalid
- * by calls to rxe_cleanup_task or rxe_disable_task.
+ * by calls to rxe-cleanup_task or rxe_disable_task.
  * In that case tasks which get here are not executed but
  * just flushed. The tasks are designed to look to see if
- * there is work to do and then do part of it before returning
+ * there is work to do and do part of it before returning
  * here with a return value of zero until all the work
- * has been consumed then it returns a non-zero value.
+ * has been consumed then it retuens a non-zero value.
  * The number of times the task can be run is limited by
  * max iterations so one task cannot hold the cpu forever.
- * If the limit is hit and work remains the task is rescheduled.
  */
-static void do_task(struct rxe_task *task)
+static void do_task(struct tasklet_struct *t)
 {
+	int cont;
+	int ret;
+	struct rxe_task *task = from_tasklet(task, t, tasklet);
 	unsigned int iterations;
 	unsigned long flags;
 	int resched = 0;
-	int cont;
-	int ret;
 
 	WARN_ON(rxe_read(task->qp) <= 0);
 
@@ -128,22 +115,25 @@ static void do_task(struct rxe_task *task)
 		} while (ret == 0 && iterations-- > 0);
 
 		spin_lock_irqsave(&task->lock, flags);
-		/* we're not done yet but we ran out of iterations.
-		 * yield the cpu and reschedule the task
-		 */
-		if (!ret) {
-			task->state = TASK_STATE_IDLE;
-			resched = 1;
-			goto exit;
-		}
-
 		switch (task->state) {
 		case TASK_STATE_BUSY:
-			task->state = TASK_STATE_IDLE;
+			if (ret) {
+				task->state = TASK_STATE_IDLE;
+			} else {
+				/* This can happen if the client
+				 * can add work faster than the
+				 * tasklet can finish it.
+				 * Reschedule the tasklet and exit
+				 * the loop to give up the cpu
+				 */
+				task->state = TASK_STATE_IDLE;
+				resched = 1;
+			}
 			break;
 
-		/* someone tried to schedule the task while we
-		 * were running, keep going
+		/* someone tried to run the task since the last time we called
+		 * func, so we will call one more time regardless of the
+		 * return value
 		 */
 		case TASK_STATE_ARMED:
 			task->state = TASK_STATE_BUSY;
@@ -151,24 +141,22 @@ static void do_task(struct rxe_task *task)
 			break;
 
 		case TASK_STATE_DRAINING:
-			task->state = TASK_STATE_DRAINED;
+			if (ret)
+				task->state = TASK_STATE_DRAINED;
+			else
+				cont = 1;
 			break;
 
 		default:
 			WARN_ON(1);
-			rxe_dbg_qp(task->qp, "unexpected task state = %d",
-				   task->state);
-			task->state = TASK_STATE_IDLE;
+			rxe_info_qp(task->qp, "unexpected task state = %d", task->state);
 		}
 
-exit:
 		if (!cont) {
 			task->num_done++;
 			if (WARN_ON(task->num_done != task->num_sched))
-				rxe_dbg_qp(
-					task->qp,
-					"%ld tasks scheduled, %ld tasks done",
-					task->num_sched, task->num_done);
+				rxe_err_qp(task->qp, "%ld tasks scheduled, %ld tasks done",
+					   task->num_sched, task->num_done);
 		}
 		spin_unlock_irqrestore(&task->lock, flags);
 	} while (cont);
@@ -181,12 +169,6 @@ static void do_task(struct rxe_task *task)
 	rxe_put(task->qp);
 }
 
-/* wrapper around do_task to fix argument for work queue */
-static void do_work(struct work_struct *work)
-{
-	do_task(container_of(work, struct rxe_task, work));
-}
-
 int rxe_init_task(struct rxe_task *task, struct rxe_qp *qp,
 		  int (*func)(struct rxe_qp *))
 {
@@ -194,9 +176,11 @@ int rxe_init_task(struct rxe_task *task, struct rxe_qp *qp,
 
 	task->qp = qp;
 	task->func = func;
+
+	tasklet_setup(&task->tasklet, do_task);
+
 	task->state = TASK_STATE_IDLE;
 	spin_lock_init(&task->lock);
-	INIT_WORK(&task->work, do_work);
 
 	return 0;
 }
@@ -229,6 +213,8 @@ void rxe_cleanup_task(struct rxe_task *task)
 	while (!is_done(task))
 		cond_resched();
 
+	tasklet_kill(&task->tasklet);
+
 	spin_lock_irqsave(&task->lock, flags);
 	task->state = TASK_STATE_INVALID;
 	spin_unlock_irqrestore(&task->lock, flags);
@@ -240,7 +226,7 @@ void rxe_cleanup_task(struct rxe_task *task)
 void rxe_run_task(struct rxe_task *task)
 {
 	unsigned long flags;
-	bool run;
+	int run;
 
 	WARN_ON(rxe_read(task->qp) <= 0);
 
@@ -249,11 +235,11 @@ void rxe_run_task(struct rxe_task *task)
 	spin_unlock_irqrestore(&task->lock, flags);
 
 	if (run)
-		do_task(task);
+		do_task(&task->tasklet);
 }
 
-/* schedule the task to run later as a work queue entry.
- * the queue_work call can be called holding
+/* schedule the task to run later as a tasklet.
+ * the tasklet)schedule call can be called holding
  * the lock.
  */
 void rxe_sched_task(struct rxe_task *task)
@@ -264,7 +250,7 @@ void rxe_sched_task(struct rxe_task *task)
 
 	spin_lock_irqsave(&task->lock, flags);
 	if (__reserve_if_idle(task))
-		queue_work(rxe_wq, &task->work);
+		tasklet_schedule(&task->tasklet);
 	spin_unlock_irqrestore(&task->lock, flags);
 }
 
@@ -291,9 +277,7 @@ void rxe_disable_task(struct rxe_task *task)
 	while (!is_done(task))
 		cond_resched();
 
-	spin_lock_irqsave(&task->lock, flags);
-	task->state = TASK_STATE_DRAINED;
-	spin_unlock_irqrestore(&task->lock, flags);
+	tasklet_disable(&task->tasklet);
 }
 
 void rxe_enable_task(struct rxe_task *task)
@@ -307,7 +291,7 @@ void rxe_enable_task(struct rxe_task *task)
 		spin_unlock_irqrestore(&task->lock, flags);
 		return;
 	}
-
 	task->state = TASK_STATE_IDLE;
+	tasklet_enable(&task->tasklet);
 	spin_unlock_irqrestore(&task->lock, flags);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index a63e258b3d66..facb7c8e3729 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -22,7 +22,7 @@ enum {
  * called again.
  */
 struct rxe_task {
-	struct work_struct	work;
+	struct tasklet_struct	tasklet;
 	int			state;
 	spinlock_t		lock;
 	struct rxe_qp		*qp;
@@ -32,10 +32,6 @@ struct rxe_task {
 	long			num_done;
 };
 
-int rxe_alloc_wq(void);
-
-void rxe_destroy_wq(void);
-
 /*
  * init rxe_task structure
  *	qp  => parameter to pass to func
-- 
2.40.1


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux