On 4/19/22 9:21 AM, Jens Axboe wrote: > On 4/19/22 6:31 AM, Jens Axboe wrote: >> On 4/19/22 6:21 AM, Avi Kivity wrote: >>> On 19/04/2022 15.04, Jens Axboe wrote: >>>> On 4/19/22 5:57 AM, Avi Kivity wrote: >>>>> On 19/04/2022 14.38, Jens Axboe wrote: >>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote: >>>>>>> A simple webserver shows about 5% loss compared to linux-aio. >>>>>>> >>>>>>> >>>>>>> I expect the loss is due to an optimization that io_uring lacks - >>>>>>> inline completion vs workqueue completion: >>>>>> I don't think that's it, io_uring never punts to a workqueue for >>>>>> completions. >>>>> >>>>> I measured this: >>>>> >>>>> >>>>> >>>>> Performance counter stats for 'system wide': >>>>> >>>>> 1,273,756 io_uring:io_uring_task_add >>>>> >>>>> 12.288597765 seconds time elapsed >>>>> >>>>> Which exactly matches with the number of requests sent. If that's the >>>>> wrong counter to measure, I'm happy to try again with the correct >>>>> counter. >>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is >>>> expected. > > Might actually be implicated. Not because it's a async worker, but > because I think we might be losing some affinity in this case. Looking > at traces, we're definitely bouncing between the poll completion side > and then execution the completion. > > Can you try this hack? It's against -git + for-5.19/io_uring. If you let > me know what base you prefer, I can do a version against that. I see > about a 3% win with io_uring with this, and was slower before against > linux-aio as you saw as well. Another thing to try - get rid of the IPI for TWA_SIGNAL, which I believe may be the underlying cause of it. diff --git a/fs/io-wq.c b/fs/io-wq.c index 32aeb2c581c5..59987dd212d8 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe, static bool io_wq_worker_wake(struct io_worker *worker, void *data) { - set_notify_signal(worker->task); + set_notify_signal(worker->task, true); wake_up_process(worker->task); return false; } @@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker, { if (work && match->fn(work, match->data)) { work->flags |= IO_WQ_WORK_CANCEL; - set_notify_signal(worker->task); + set_notify_signal(worker->task, true); return true; } diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3c8b34876744..ac1f14973e09 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -359,10 +359,10 @@ static inline void clear_notify_signal(void) * Called to break out of interruptible wait loops, and enter the * exit_to_user_mode_loop(). */ -static inline void set_notify_signal(struct task_struct *task) +static inline void set_notify_signal(struct task_struct *task, bool need_ipi) { if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) && - !wake_up_state(task, TASK_INTERRUPTIBLE)) + !wake_up_state(task, TASK_INTERRUPTIBLE) && need_ipi) kick_process(task); } diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index 5d03a2ad1066..bff53f539933 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -367,7 +367,7 @@ static void klp_send_signals(void) * Send fake signal to all non-kthread tasks which are * still not migrated. */ - set_notify_signal(task); + set_notify_signal(task, true); } } read_unlock(&tasklist_lock); diff --git a/kernel/task_work.c b/kernel/task_work.c index c59e1a49bc40..47d7024dc499 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -51,7 +51,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work, set_notify_resume(task); break; case TWA_SIGNAL: - set_notify_signal(task); + set_notify_signal(task, false); break; default: WARN_ON_ONCE(1); -- Jens Axboe