On 04/02/2021 11:05, Pavel Begunkov wrote: > On 04/02/2021 09:20, Xiaoguang Wang wrote: >> Abaci Robot reported following panic: >> BUG: kernel NULL pointer dereference, address: 0000000000000000 >> PGD 800000010ef3f067 P4D 800000010ef3f067 PUD 10d9df067 PMD 0 >> Oops: 0002 [#1] SMP PTI >> CPU: 0 PID: 1869 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc3+ #1 >> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >> RIP: 0010:put_files_struct+0x1b/0x120 >> Code: 24 18 c7 00 f4 ff ff ff e9 4d fd ff ff 66 90 0f 1f 44 00 00 41 57 41 56 49 89 fe 41 55 41 54 55 53 48 83 ec 08 e8 b5 6b db ff 41 ff 0e 74 13 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 9c >> RSP: 0000:ffffc90002147d48 EFLAGS: 00010293 >> RAX: 0000000000000000 RBX: ffff88810d9a5300 RCX: 0000000000000000 >> RDX: ffff88810d87c280 RSI: ffffffff8144ba6b RDI: 0000000000000000 >> RBP: 0000000000000080 R08: 0000000000000001 R09: ffffffff81431500 >> R10: ffff8881001be000 R11: 0000000000000000 R12: ffff88810ac2f800 >> R13: ffff88810af38a00 R14: 0000000000000000 R15: ffff8881057130c0 >> FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000000000000 CR3: 000000010dbaa002 CR4: 00000000003706f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> __io_clean_op+0x10c/0x2a0 >> io_dismantle_req+0x3c7/0x600 >> __io_free_req+0x34/0x280 >> io_put_req+0x63/0xb0 >> io_worker_handle_work+0x60e/0x830 >> ? io_wqe_worker+0x135/0x520 >> io_wqe_worker+0x158/0x520 >> ? __kthread_parkme+0x96/0xc0 >> ? io_worker_handle_work+0x830/0x830 >> kthread+0x134/0x180 >> ? kthread_create_worker_on_cpu+0x90/0x90 >> ret_from_fork+0x1f/0x30 >> Modules linked in: >> CR2: 0000000000000000 >> ---[ end trace c358ca86af95b1e7 ]--- >> >> I guess case below can trigger above panic: there're two threads which >> operates different io_uring ctxs and share same sqthread identity, and >> later one thread exits, io_uring_cancel_task_requests() will clear >> task->io_uring->identity->files to be NULL in sqpoll mode, then another >> ctx that uses same identity will panic. >> >> Indeed we don't need to clear task->io_uring->identity->files here, >> io_grab_identity() should handle identity->files changes well, if >> task->io_uring->identity->files is not equal to current->files, >> io_cow_identity() should handle this changes well. > > Didn't look in the trace above, but the change looks good. I even did > it myself a couple of weeks ago, but it got dropped because of unrelated > hassle. > > I'll test/review a bit later. Reviewed-by: Pavel Begunkov <asml.silence@xxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx # 5.5+ > >> >> Reported-by: Abaci Robot <abaci@xxxxxxxxxxxxxxxxx> >> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@xxxxxxxxxxxxxxxxx> >> --- >> fs/io_uring.c | 6 ------ >> 1 file changed, 6 deletions(-) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 38c6cbe1ab38..5d3348d66f06 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -8982,12 +8982,6 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx, >> >> if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) { >> atomic_dec(&task->io_uring->in_idle); >> - /* >> - * If the files that are going away are the ones in the thread >> - * identity, clear them out. >> - */ >> - if (task->io_uring->identity->files == files) >> - task->io_uring->identity->files = NULL; >> io_sq_thread_unpark(ctx->sq_data); >> } >> } >> > -- Pavel Begunkov