On 11/4/24 8:34 AM, Pavel Begunkov wrote: > On 11/4/24 15:27, Pavel Begunkov wrote: >> On 11/4/24 15:08, Jens Axboe wrote: >>> On 11/4/24 6:13 AM, Pavel Begunkov wrote: >>>> On 11/4/24 11:31, syzbot wrote: >>>>> syzbot has bisected this issue to: >>>>> >>>>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706 >>>>> Author: Jens Axboe <axboe@xxxxxxxxx> >>>>> Date: Sat Oct 26 01:27:39 2024 +0000 >>>>> >>>>> io_uring/rsrc: get rid of per-ring io_rsrc_node list >>>>> >>>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 >>>>> start commit: c88416ba074a Add linux-next specific files for 20241101 >>>>> git tree: linux-next >>>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 >>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 >>>>> >>>>> Reported-by: syzbot+e333341d3d985e5173b2@xxxxxxxxxxxxxxxxxxxxxxxxx >>>>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") >>>>> >>>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >>>> >>>> Previously all puts were done by requests, which in case of an exiting >>>> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, >>>> while the original task is still alive. Should be fine in general because >>>> at this point there could be no requests posting in parallel and all >>>> is synchronised, so it's a false positive, but we need to change the assert >>>> or something else. >>> >>> Maybe something ala the below? Also changes these triggers to be >>> _once(), no point spamming them. >>> >>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h >>> index 00409505bf07..7792ed91469b 100644 >>> --- a/io_uring/io_uring.h >>> +++ b/io_uring/io_uring.h >>> @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) >>> * Not from an SQE, as those cannot be submitted, but via >>> * updating tagged resources. >>> */ >>> - if (ctx->submitter_task->flags & PF_EXITING) >>> - lockdep_assert(current_work()); >>> + if (ctx->submitter_task->flags & PF_EXITING || >>> + percpu_ref_is_dying(&ctx->refs)) >> >> io_move_task_work_from_local() executes requests with a normal >> task_work of a possible alive task, which which will the check. >> >> I was thinking to kill the extra step as it doesn't make sense, >> git garbage digging shows the patch below, but I don't remember >> if it has ever been tested. >> >> >> commit 65560732da185c85f472e9c94e6b8ff147fc4b96 >> Author: Pavel Begunkov <asml.silence@xxxxxxxxx> >> Date: Fri Jun 7 13:13:06 2024 +0100 >> >> io_uring: skip normal tw with DEFER_TASKRUN >> DEFER_TASKRUN execution first falls back to normal task_work and only >> then, when the task is dying, to workers. It's cleaner to remove the >> middle step and use workers as the only fallback. It also detaches >> DEFER_TASKRUN and normal task_work handling from each other. >> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> > > Not sure what spacing got broken here. > > Regardless, the rule with sth like that should be simpler, > i.e. a ctx is getting killed => everything is run from fallback/kthread. I like it, and now there's another reason to do it. Can you out the patch? -- Jens Axboe