On Wed, 2022-09-21 at 13:18 -0600, Jens Axboe wrote: > io_uring will run task_work from contexts that have been prepared for > waiting, and in doing so it'll implicitly set the task running again > to avoid issues with blocking conditions. The new deferred local > task_work doesn't do that, which can result in spews on this being > an invalid condition: > > [ 112.917576] do not call blocking ops when !TASK_RUNNING; state=1 > set at [<00000000ad64af64>] prepare_to_wait_exclusive+0x3f/0xd0 > [ 112.983088] WARNING: CPU: 1 PID: 190 at kernel/sched/core.c:9819 > __might_sleep+0x5a/0x60 > [ 112.987240] Modules linked in: > [ 112.990504] CPU: 1 PID: 190 Comm: io_uring Not tainted 6.0.0-rc6+ > #1617 > [ 113.053136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > [ 113.133650] RIP: 0010:__might_sleep+0x5a/0x60 > [ 113.136507] Code: ee 48 89 df 5b 31 d2 5d e9 33 ff ff ff 48 8b 90 > 30 0b 00 00 48 c7 c7 90 de 45 82 c6 05 20 8b 79 01 01 48 89 d1 e8 3a > 49 77 00 <0f> 0b eb d1 66 90 0f 1f 44 00 00 9c 58 f6 c4 02 74 35 65 > 8b 05 ed > [ 113.223940] RSP: 0018:ffffc90000537ca0 EFLAGS: 00010286 > [ 113.232903] RAX: 0000000000000000 RBX: ffffffff8246782c RCX: > ffffffff8270bcc8 > IOPS=133.15K, BW=520MiB/s, IOS/call=32/31 > [ 113.353457] RDX: ffffc90000537b50 RSI: 00000000ffffdfff RDI: > 0000000000000001 > [ 113.358970] RBP: 00000000000003bc R08: 0000000000000000 R09: > c0000000ffffdfff > [ 113.361746] R10: 0000000000000001 R11: ffffc90000537b48 R12: > ffff888103f97280 > [ 113.424038] R13: 0000000000000000 R14: 0000000000000001 R15: > 0000000000000001 > [ 113.428009] FS: 00007f67ae7fc700(0000) GS:ffff88842fc80000(0000) > knlGS:0000000000000000 > [ 113.432794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 113.503186] CR2: 00007f67b8b9b3b0 CR3: 0000000102b9b005 CR4: > 0000000000770ee0 > [ 113.507291] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 113.512669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 113.574374] PKRU: 55555554 > [ 113.576800] Call Trace: > [ 113.578325] <TASK> > [ 113.579799] set_page_dirty_lock+0x1b/0x90 > [ 113.582411] __bio_release_pages+0x141/0x160 > [ 113.673078] ? set_next_entity+0xd7/0x190 > [ 113.675632] blk_rq_unmap_user+0xaa/0x210 > [ 113.678398] ? timerqueue_del+0x2a/0x40 > [ 113.679578] nvme_uring_task_cb+0x94/0xb0 > [ 113.683025] __io_run_local_work+0x8a/0x150 > [ 113.743724] ? io_cqring_wait+0x33d/0x500 > [ 113.746091] io_run_local_work.part.76+0x2e/0x60 > [ 113.750091] io_cqring_wait+0x2e7/0x500 > [ 113.752395] ? > trace_event_raw_event_io_uring_req_failed+0x180/0x180 > [ 113.823533] __x64_sys_io_uring_enter+0x131/0x3c0 > [ 113.827382] ? switch_fpu_return+0x49/0xc0 > [ 113.830753] do_syscall_64+0x34/0x80 > [ 113.832620] entry_SYSCALL_64_after_hwframe+0x5e/0xc8 > > Ensure that we mark current as TASK_RUNNING for deferred task_work > as well. > > Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") > Reported-by: Stefan Roesch <shr@xxxxxx> > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > --- > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 3875ea897cdf..f359e24b46c3 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1215,6 +1215,7 @@ int io_run_local_work(struct io_ring_ctx *ctx) > if (llist_empty(&ctx->work_llist)) > return 0; > > + __set_current_state(TASK_RUNNING); > locked = mutex_trylock(&ctx->uring_lock); > ret = __io_run_local_work(ctx, locked); > if (locked) > Reviewed-by: Dylan Yudaken <dylany@xxxxxx>