On Mon, Aug 10, 2020 at 10:25 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 8/10/20 2:13 PM, Jens Axboe wrote: > >> Would it be clearer to write it like so perhaps? > >> > >> /* > >> * Optimization; when the task is RUNNING we can do with a > >> * cheaper TWA_RESUME notification because,... <reason goes > >> * here>. Otherwise do the more expensive, but always correct > >> * TWA_SIGNAL. > >> */ > >> if (READ_ONCE(tsk->state) == TASK_RUNNING) { > >> __task_work_notify(tsk, TWA_RESUME); > >> if (READ_ONCE(tsk->state) == TASK_RUNNING) > >> return; > >> } > >> __task_work_notify(tsk, TWA_SIGNAL); > >> wake_up_process(tsk); > > > > Yeah that is easier to read, wasn't a huge fan of the loop since it's > > only a single retry kind of condition. I'll adopt this suggestion, > > thanks! > > Re-write it a bit on top of that, just turning it into two separate > READ_ONCE, and added appropriate comments. For the SQPOLL case, the > wake_up_process() is enough, so we can clean up that if/else. > > https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=49bc5c16483945982cf81b0109d7da7cd9ee55ed I think I'm starting to understand the overall picture here, and I think if my understanding is correct, your solution isn't going to work properly. My understanding of the scenario you're trying to address is: - task A starts up io_uring - task A tells io_uring to bump the counter of an eventfd E when work has been completed - task A submits some work ("read a byte from file descriptor X", or something like that) - io_uring internally starts an asynchronous I/O operation, with a callback C - task A calls read(E, &counter, sizeof(counter)) to wait for events to be processed - the async I/O operation finishes, C is invoked, and C schedules task_work for task A And here you run into a deadlock, because the task_work will only run when task A returns from the syscall, but the syscall will only return once the task_work is executing and has finished the I/O operation. If that is the scenario you're trying to solve here (where you're trying to force a task that's in the middle of some syscall that's completely unrelated to io_uring to return back to syscall context), I don't think this will work: It might well be that the task has e.g. just started entering the read() syscall, and is *about to* block, but is currently still running.