On 2/21/20 9:23 AM, Peter Zijlstra wrote: > On Fri, Feb 21, 2020 at 06:49:16AM -0800, Jens Axboe wrote: > >>> Jens, what exactly is the benefit of running this on every random >>> schedule() vs in io_cqring_wait() ? Or even, since io_cqring_wait() is >>> the very last thing the syscall does, task_work. >> >> I took a step back and I think we can just use the task work, which >> makes this a lot less complicated in terms of locking and schedule >> state. Ran some quick testing with the below and it works for me. >> >> I'm going to re-spin based on this and just dump the sched_work >> addition. > > Aswesome, simpler is better. Agree! >> @@ -5392,6 +5392,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, >> TASK_INTERRUPTIBLE); >> if (io_should_wake(&iowq, false)) >> break; >> + if (current->task_works) { >> + task_work_run(); >> + if (io_should_wake(&iowq, false)) >> + break; >> + continue; >> + } > > if (current->task_works) > task_work_run(); > if (io_should_wake(&iowq, false); > break; > > doesn't work? Yeah it totally does, I'll make that change. Not sure what I was thinking there. >> schedule(); >> if (signal_pending(current)) { >> ret = -EINTR; > > > Anyway, we need to be careful about the context where we call > task_work_run(), but afaict doing it here should be fine. Right, that's the main win over the sched in/out approach. On clean entry to a system call we should be fine, I added it for io_uring_enter() as well. We're not holding anything at that point. -- Jens Axboe