Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2)

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 4 Nov 2024 15:27:27 +0000

On 11/4/24 15:08, Jens Axboe wrote:
On 11/4/24 6:13 AM, Pavel Begunkov wrote:
On 11/4/24 11:31, syzbot wrote:
syzbot has bisected this issue to:

commit 3f1a546444738b21a8c312a4b49dc168b65c8706
Author: Jens Axboe <axboe@xxxxxxxxx>
Date:   Sat Oct 26 01:27:39 2024 +0000

      io_uring/rsrc: get rid of per-ring io_rsrc_node list

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000
start commit:   c88416ba074a Add linux-next specific files for 20241101
git tree:       linux-next
final oops:     https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000
console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f
dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12c04740580000

Reported-by: syzbot+e333341d3d985e5173b2@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Previously all puts were done by requests, which in case of an exiting
ring were fallback'ed to normal tw. Now, the unregister path posts CQEs,
while the original task is still alive. Should be fine in general because
at this point there could be no requests posting in parallel and all
is synchronised, so it's a false positive, but we need to change the assert
or something else.

Maybe something ala the below? Also changes these triggers to be
_once(), no point spamming them.

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 00409505bf07..7792ed91469b 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
  		 * Not from an SQE, as those cannot be submitted, but via
  		 * updating tagged resources.
  		 */
-		if (ctx->submitter_task->flags & PF_EXITING)
-			lockdep_assert(current_work());
+		if (ctx->submitter_task->flags & PF_EXITING ||
+		    percpu_ref_is_dying(&ctx->refs))

io_move_task_work_from_local() executes requests with a normal
task_work of a possible alive task, which which will the check.

I was thinking to kill the extra step as it doesn't make sense,
git garbage digging shows the patch below, but I don't remember
if it has ever been tested.


commit 65560732da185c85f472e9c94e6b8ff147fc4b96
Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date:   Fri Jun 7 13:13:06 2024 +0100

    io_uring: skip normal tw with DEFER_TASKRUN
    
    DEFER_TASKRUN execution first falls back to normal task_work and only
    then, when the task is dying, to workers. It's cleaner to remove the
    middle step and use workers as the only fallback. It also detaches
    DEFER_TASKRUN and normal task_work handling from each other.
    
    Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9789cf8c68c1..d9e3661ff93d 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1111,9 +1111,8 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head,
 	return xchg(&head->first, new);
 }
 
-static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
+static __cold void __io_fallback_tw(struct llist_node *node, bool sync)
 {
-	struct llist_node *node = llist_del_all(&tctx->task_list);
 	struct io_ring_ctx *last_ctx = NULL;
 	struct io_kiocb *req;
 
@@ -1139,6 +1138,13 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
 	}
 }
 
+static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
+{
+	struct llist_node *node = llist_del_all(&tctx->task_list);
+
+	__io_fallback_tw(node, sync);
+}
+
 struct llist_node *tctx_task_work_run(struct io_uring_task *tctx,
 				      unsigned int max_entries,
 				      unsigned int *count)
@@ -1287,13 +1293,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
 	struct llist_node *node;
 
 	node = llist_del_all(&ctx->work_llist);
-	while (node) {
-		struct io_kiocb *req = container_of(node, struct io_kiocb,
-						    io_task_work.node);
-
-		node = node->next;
-		io_req_normal_work_add(req);
-	}
+	__io_fallback_tw(node, false);
 }
 
 static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e46d13e8a215..bc0a800b5ae7 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -128,7 +128,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 		 * Not from an SQE, as those cannot be submitted, but via
 		 * updating tagged resources.
 		 */
-		if (ctx->submitter_task->flags & PF_EXITING)
+		if (percpu_ref_is_dying(&ctx->refs))
 			lockdep_assert(current_work());
 		else
 			lockdep_assert(current == ctx->submitter_task);

--
Pavel Begunkov