Re: [bug report] blktests srp/002 hang

Zhu Yanjun <yanjun.zhu@xxxxxxxxx> · Sat, 16 Sep 2023 13:59:36 +0800

在 2023/9/14 7:38, Zhu Yanjun 写道:
在 2023/9/14 1:36, Bob Pearson 写道:
On 8/25/23 08:52, Bart Van Assche wrote:
On 8/24/23 18:11, Shinichiro Kawasaki wrote:
If it takes time to resolve the issues, it sounds a good idea to 
make siw driver
default, since it will make the hangs less painful for blktests 
users. Another
idea to reduce the pain is to improve srp/002 and srp/011 to detect 
the hangs
and report them as failures.

At this moment we don't know whether the hangs can be converted into 
failures.
Answering this question is only possible after we have found the root 
cause of
the hang. If the hang would be caused by commands getting stuck in 
multipathd
then it can be solved by changing the path configuration (see also 
the dmsetup
message commands in blktests). If the hang is caused by a kernel bug 
then it's
very well possible that there is no way to recover other than by 
rebooting the
system on which the tests are run.

Thanks,

Bart.

Since 6.6.0-rc1 came out I decided to give blktests srp another try 
with the current
rdma for-next branch on my Ubuntu (debian) system. For the first time 
in a very long
time all the srp test cases run correctly multiple times. I ran each 
one 3X.

I had tried to build multipath-tools from source but ran into problems 
so I reinstalled
the current Ubuntu packages. I have no idea what was the root cause 
that finally went
away but I don't think it was in rxe as there aren't any recent 
patches related to the
blktests failures. I did notice that the dmesg traces picked up a 
couple of lines after
the place where it used to hang. Something about setting an ALUA 
timeout to 60 seconds.

Thanks to all who worked on this.

Hi, Bob

About this problem, IIRC, this problem easily occurred on Debian and 
Fedora 38 and with the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue 
support for rxe tasks").

And on Debian, with the latest multipathd, this problem seems to disappear.

On Fedora 38, even with the latest multipathd, this problem still can be 
observed.

On Debian, with the latest multipathd or revert the commit 9b4b7c1f9f54 
("RDMA/rxe: Add workqueue support for rxe tasks"), this problem will 
disappear.

On Fedora 38, if the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue 
support for rxe tasks") is reverted, will this problem still appear?
I do not have such test environment. The commit is in the attachment,
can anyone have a test? Please let us know the test result. Thanks.

Zhu Yanjun


On Ubuntu, it is difficult to reproduce this problem.

Perhaps this is why you can not reproduce this problem on Ubuntu.

It seems that this problem is related with linux distribution and the 
version of multipathd.

If I am missing something, please feel free to let me know.

Zhu Yanjun


Bob Pearson
From fd2360edbc9171298d2e91fd9b74b4c3022db9d4 Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Date: Fri, 15 Sep 2023 23:07:17 -0400
Subject: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

This reverts commit 9b4b7c1f9f54120940e243251e2b1407767b3381.

Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
---
 drivers/infiniband/sw/rxe/rxe.c      |   9 +--
 drivers/infiniband/sw/rxe/rxe_task.c | 110 ++++++++++++---------------
 drivers/infiniband/sw/rxe/rxe_task.h |   6 +-
 3 files changed, 49 insertions(+), 76 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 54c723a6edda..7a7e713de52d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -212,15 +212,9 @@ static int __init rxe_module_init(void)
 {
 	int err;
 
-	err = rxe_alloc_wq();
-	if (err)
-		return err;
-
 	err = rxe_net_init();
-	if (err) {
-		rxe_destroy_wq();
+	if (err)
 		return err;
-	}
 
 	rdma_link_register(&rxe_link_ops);
 	pr_info("loaded\n");
@@ -232,7 +226,6 @@ static void __exit rxe_module_exit(void)
 	rdma_link_unregister(&rxe_link_ops);
 	ib_unregister_driver(RDMA_DRIVER_RXE);
 	rxe_net_exit();
-	rxe_destroy_wq();
 
 	pr_info("unloaded\n");
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 1501120d4f52..fb9a6bc8e620 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -6,24 +6,8 @@
 
 #include "rxe.h"
 
-static struct workqueue_struct *rxe_wq;
-
-int rxe_alloc_wq(void)
-{
-	rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
-	if (!rxe_wq)
-		return -ENOMEM;
-
-	return 0;
-}
-
-void rxe_destroy_wq(void)
-{
-	destroy_workqueue(rxe_wq);
-}
-
 /* Check if task is idle i.e. not running, not scheduled in
- * work queue and not draining. If so move to busy to
+ * tasklet queue and not draining. If so move to busy to
  * reserve a slot in do_task() by setting to busy and taking
  * a qp reference to cover the gap from now until the task finishes.
  * state will move out of busy if task returns a non zero value
@@ -37,6 +21,9 @@ static bool __reserve_if_idle(struct rxe_task *task)
 {
 	WARN_ON(rxe_read(task->qp) <= 0);
 
+	if (task->tasklet.state & BIT(TASKLET_STATE_SCHED))
+		return false;
+
 	if (task->state == TASK_STATE_IDLE) {
 		rxe_get(task->qp);
 		task->state = TASK_STATE_BUSY;
@@ -51,7 +38,7 @@ static bool __reserve_if_idle(struct rxe_task *task)
 }
 
 /* check if task is idle or drained and not currently
- * scheduled in the work queue. This routine is
+ * scheduled in the tasklet queue. This routine is
  * called by rxe_cleanup_task or rxe_disable_task to
  * see if the queue is empty.
  * Context: caller should hold task->lock.
@@ -59,7 +46,7 @@ static bool __reserve_if_idle(struct rxe_task *task)
  */
 static bool __is_done(struct rxe_task *task)
 {
-	if (work_pending(&task->work))
+	if (task->tasklet.state & BIT(TASKLET_STATE_SCHED))
 		return false;
 
 	if (task->state == TASK_STATE_IDLE ||
@@ -90,23 +77,23 @@ static bool is_done(struct rxe_task *task)
  * schedules the task. They must call __reserve_if_idle to
  * move the task to busy before calling or scheduling.
  * The task can also be moved to drained or invalid
- * by calls to rxe_cleanup_task or rxe_disable_task.
+ * by calls to rxe-cleanup_task or rxe_disable_task.
  * In that case tasks which get here are not executed but
  * just flushed. The tasks are designed to look to see if
- * there is work to do and then do part of it before returning
+ * there is work to do and do part of it before returning
  * here with a return value of zero until all the work
- * has been consumed then it returns a non-zero value.
+ * has been consumed then it retuens a non-zero value.
  * The number of times the task can be run is limited by
  * max iterations so one task cannot hold the cpu forever.
- * If the limit is hit and work remains the task is rescheduled.
  */
-static void do_task(struct rxe_task *task)
+static void do_task(struct tasklet_struct *t)
 {
+	int cont;
+	int ret;
+	struct rxe_task *task = from_tasklet(task, t, tasklet);
 	unsigned int iterations;
 	unsigned long flags;
 	int resched = 0;
-	int cont;
-	int ret;
 
 	WARN_ON(rxe_read(task->qp) <= 0);
 
@@ -128,22 +115,25 @@ static void do_task(struct rxe_task *task)
 		} while (ret == 0 && iterations-- > 0);
 
 		spin_lock_irqsave(&task->lock, flags);
-		/* we're not done yet but we ran out of iterations.
-		 * yield the cpu and reschedule the task
-		 */
-		if (!ret) {
-			task->state = TASK_STATE_IDLE;
-			resched = 1;
-			goto exit;
-		}
-
 		switch (task->state) {
 		case TASK_STATE_BUSY:
-			task->state = TASK_STATE_IDLE;
+			if (ret) {
+				task->state = TASK_STATE_IDLE;
+			} else {
+				/* This can happen if the client
+				 * can add work faster than the
+				 * tasklet can finish it.
+				 * Reschedule the tasklet and exit
+				 * the loop to give up the cpu
+				 */
+				task->state = TASK_STATE_IDLE;
+				resched = 1;
+			}
 			break;
 
-		/* someone tried to schedule the task while we
-		 * were running, keep going
+		/* someone tried to run the task since the last time we called
+		 * func, so we will call one more time regardless of the
+		 * return value
 		 */
 		case TASK_STATE_ARMED:
 			task->state = TASK_STATE_BUSY;
@@ -151,24 +141,22 @@ static void do_task(struct rxe_task *task)
 			break;
 
 		case TASK_STATE_DRAINING:
-			task->state = TASK_STATE_DRAINED;
+			if (ret)
+				task->state = TASK_STATE_DRAINED;
+			else
+				cont = 1;
 			break;
 
 		default:
 			WARN_ON(1);
-			rxe_dbg_qp(task->qp, "unexpected task state = %d",
-				   task->state);
-			task->state = TASK_STATE_IDLE;
+			rxe_info_qp(task->qp, "unexpected task state = %d", task->state);
 		}
 
-exit:
 		if (!cont) {
 			task->num_done++;
 			if (WARN_ON(task->num_done != task->num_sched))
-				rxe_dbg_qp(
-					task->qp,
-					"%ld tasks scheduled, %ld tasks done",
-					task->num_sched, task->num_done);
+				rxe_err_qp(task->qp, "%ld tasks scheduled, %ld tasks done",
+					   task->num_sched, task->num_done);
 		}
 		spin_unlock_irqrestore(&task->lock, flags);
 	} while (cont);
@@ -181,12 +169,6 @@ static void do_task(struct rxe_task *task)
 	rxe_put(task->qp);
 }
 
-/* wrapper around do_task to fix argument for work queue */
-static void do_work(struct work_struct *work)
-{
-	do_task(container_of(work, struct rxe_task, work));
-}
-
 int rxe_init_task(struct rxe_task *task, struct rxe_qp *qp,
 		  int (*func)(struct rxe_qp *))
 {
@@ -194,9 +176,11 @@ int rxe_init_task(struct rxe_task *task, struct rxe_qp *qp,
 
 	task->qp = qp;
 	task->func = func;
+
+	tasklet_setup(&task->tasklet, do_task);
+
 	task->state = TASK_STATE_IDLE;
 	spin_lock_init(&task->lock);
-	INIT_WORK(&task->work, do_work);
 
 	return 0;
 }
@@ -229,6 +213,8 @@ void rxe_cleanup_task(struct rxe_task *task)
 	while (!is_done(task))
 		cond_resched();
 
+	tasklet_kill(&task->tasklet);
+
 	spin_lock_irqsave(&task->lock, flags);
 	task->state = TASK_STATE_INVALID;
 	spin_unlock_irqrestore(&task->lock, flags);
@@ -240,7 +226,7 @@ void rxe_cleanup_task(struct rxe_task *task)
 void rxe_run_task(struct rxe_task *task)
 {
 	unsigned long flags;
-	bool run;
+	int run;
 
 	WARN_ON(rxe_read(task->qp) <= 0);
 
@@ -249,11 +235,11 @@ void rxe_run_task(struct rxe_task *task)
 	spin_unlock_irqrestore(&task->lock, flags);
 
 	if (run)
-		do_task(task);
+		do_task(&task->tasklet);
 }
 
-/* schedule the task to run later as a work queue entry.
- * the queue_work call can be called holding
+/* schedule the task to run later as a tasklet.
+ * the tasklet)schedule call can be called holding
  * the lock.
  */
 void rxe_sched_task(struct rxe_task *task)
@@ -264,7 +250,7 @@ void rxe_sched_task(struct rxe_task *task)
 
 	spin_lock_irqsave(&task->lock, flags);
 	if (__reserve_if_idle(task))
-		queue_work(rxe_wq, &task->work);
+		tasklet_schedule(&task->tasklet);
 	spin_unlock_irqrestore(&task->lock, flags);
 }
 
@@ -291,9 +277,7 @@ void rxe_disable_task(struct rxe_task *task)
 	while (!is_done(task))
 		cond_resched();
 
-	spin_lock_irqsave(&task->lock, flags);
-	task->state = TASK_STATE_DRAINED;
-	spin_unlock_irqrestore(&task->lock, flags);
+	tasklet_disable(&task->tasklet);
 }
 
 void rxe_enable_task(struct rxe_task *task)
@@ -307,7 +291,7 @@ void rxe_enable_task(struct rxe_task *task)
 		spin_unlock_irqrestore(&task->lock, flags);
 		return;
 	}
-
 	task->state = TASK_STATE_IDLE;
+	tasklet_enable(&task->tasklet);
 	spin_unlock_irqrestore(&task->lock, flags);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index a63e258b3d66..facb7c8e3729 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -22,7 +22,7 @@ enum {
  * called again.
  */
 struct rxe_task {
-	struct work_struct	work;
+	struct tasklet_struct	tasklet;
 	int			state;
 	spinlock_t		lock;
 	struct rxe_qp		*qp;
@@ -32,10 +32,6 @@ struct rxe_task {
 	long			num_done;
 };
 
-int rxe_alloc_wq(void);
-
-void rxe_destroy_wq(void);
-
 /*
  * init rxe_task structure
  *	qp  => parameter to pass to func
-- 
2.40.1