On Wed, May 26, 2021 at 05:48PM +0200, Marco Elver wrote: > On Wed, May 26, 2021 at 08:44AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: a050a6d2 Merge tag 'perf-tools-fixes-for-v5.13-2021-05-24'.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=13205087d00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=3bcc8a6b51ef8094 > > dashboard link: https://syzkaller.appspot.com/bug?extid=73554e2258b7b8bf0bbf > > compiler: Debian clang version 11.0.1-2 > [...] > > write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1: > > io_uring_clean_tctx fs/io_uring.c:9042 [inline] > > __io_uring_cancel+0x261/0x3b0 fs/io_uring.c:9136 > > io_uring_files_cancel include/linux/io_uring.h:16 [inline] > > do_exit+0x185/0x1560 kernel/exit.c:781 > > do_group_exit+0xce/0x1a0 kernel/exit.c:923 > > get_signal+0xfc3/0x1610 kernel/signal.c:2835 > > arch_do_signal_or_restart+0x2a/0x220 arch/x86/kernel/signal.c:789 > > handle_signal_work kernel/entry/common.c:147 [inline] > > exit_to_user_mode_loop kernel/entry/common.c:171 [inline] > > exit_to_user_mode_prepare+0x109/0x190 kernel/entry/common.c:208 > > __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] > > syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 > > do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0: > > io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline] > > io_uring_try_cancel_requests+0x1ce/0x8e0 fs/io_uring.c:8933 > > io_ring_exit_work+0x7c/0x1110 fs/io_uring.c:8736 > > process_one_work+0x3e9/0x8f0 kernel/workqueue.c:2276 > > worker_thread+0x636/0xae0 kernel/workqueue.c:2422 > > kthread+0x1d0/0x1f0 kernel/kthread.c:313 > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 > > I wasn't entirely sure if io_wq is guaranteed to remain live in this > case in io_uring_try_cancel_iowq(), but the comment there suggests it > does. In that case, I think the below patch would explain the situation > better and also propose a fix. > > Thoughts? Due to some moving around of code, the patch lost the actual fix (using atomically read io_wq) -- so here it is again ... hopefully as intended. :-) Thanks, -- Marco From: Marco Elver <elver@xxxxxxxxxx> Date: Wed, 26 May 2021 16:56:37 +0200 Subject: [PATCH] io_uring: fix data race to avoid potential NULL-deref Commit ba5ef6dc8a82 ("io_uring: fortify tctx/io_wq cleanup") introduced setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to detect a data race between between accesses to tctx->io_wq: write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1: io_uring_clean_tctx fs/io_uring.c:9042 [inline] __io_uring_cancel fs/io_uring.c:9136 io_uring_files_cancel include/linux/io_uring.h:16 [inline] do_exit kernel/exit.c:781 do_group_exit kernel/exit.c:923 get_signal kernel/signal.c:2835 arch_do_signal_or_restart arch/x86/kernel/signal.c:789 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] ... read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0: io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline] io_uring_try_cancel_requests fs/io_uring.c:8933 io_ring_exit_work fs/io_uring.c:8736 process_one_work kernel/workqueue.c:2276 ... With the config used, KCSAN only reports data races with value changes: this implies that in the case here we also know that tctx->io_wq was non-NULL. Therefore, depending on interleaving, we may end up with: [CPU 0] | [CPU 1] io_uring_try_cancel_iowq() | io_uring_clean_tctx() if (!tctx->io_wq) // false | ... ... | tctx->io_wq = NULL io_wq_cancel_cb(tctx->io_wq, ...) | ... -> NULL-deref | Note: It is likely that thus far we've gotten lucky and the compiler optimizes the double-read into a single read into a register -- but this is never guaranteed, and can easily change with a different config! Fix the data race by atomically accessing tctx->io_wq. Of course, this assumes that a valid io_wq remains alive for the duration of io_uring_try_cancel_iowq(), which should be the case per comment there. Reported-by: syzbot+bf2b3d0435b9b728946c@xxxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Marco Elver <elver@xxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Pavel Begunkov <asml.silence@xxxxxxxxx> --- fs/io_uring.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 5f82954004f6..e681ece1bbca 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8903,14 +8903,18 @@ static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx) mutex_lock(&ctx->uring_lock); list_for_each_entry(node, &ctx->tctx_list, ctx_node) { struct io_uring_task *tctx = node->task->io_uring; + struct io_wq *io_wq; + if (!tctx) + continue; /* * io_wq will stay alive while we hold uring_lock, because it's * killed after ctx nodes, which requires to take the lock. */ - if (!tctx || !tctx->io_wq) + io_wq = READ_ONCE(tctx->io_wq); + if (!io_wq) continue; - cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true); + cret = io_wq_cancel_cb(io_wq, io_cancel_ctx_cb, ctx, true); ret |= (cret != IO_WQ_CANCEL_NOTFOUND); } mutex_unlock(&ctx->uring_lock); @@ -9039,7 +9043,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx) struct io_tctx_node *node; unsigned long index; - tctx->io_wq = NULL; + WRITE_ONCE(tctx->io_wq, NULL); xa_for_each(&tctx->xa, index, node) io_uring_del_task_file(index); if (wq) -- 2.31.1.818.g46aad6cb9e-goog