On 19/04/2022 20.14, Jens Axboe wrote:
On 4/19/22 9:21 AM, Jens Axboe wrote:
On 4/19/22 6:31 AM, Jens Axboe wrote:
On 4/19/22 6:21 AM, Avi Kivity wrote:
On 19/04/2022 15.04, Jens Axboe wrote:
On 4/19/22 5:57 AM, Avi Kivity wrote:
On 19/04/2022 14.38, Jens Axboe wrote:
On 4/19/22 5:07 AM, Avi Kivity wrote:
A simple webserver shows about 5% loss compared to linux-aio.
I expect the loss is due to an optimization that io_uring lacks -
inline completion vs workqueue completion:
I don't think that's it, io_uring never punts to a workqueue for
completions.
I measured this:
Performance counter stats for 'system wide':
1,273,756 io_uring:io_uring_task_add
12.288597765 seconds time elapsed
Which exactly matches with the number of requests sent. If that's the
wrong counter to measure, I'm happy to try again with the correct
counter.
io_uring_task_add() isn't a workqueue, it's task_work. So that is
expected.
Might actually be implicated. Not because it's a async worker, but
because I think we might be losing some affinity in this case. Looking
at traces, we're definitely bouncing between the poll completion side
and then execution the completion.
Can you try this hack? It's against -git + for-5.19/io_uring. If you let
me know what base you prefer, I can do a version against that. I see
about a 3% win with io_uring with this, and was slower before against
linux-aio as you saw as well.
Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
believe may be the underlying cause of it.
Resurrecting an old thread. I have a question about timeliness of
completions. Let's assume a request has completed. From the patch, it
appears that io_uring will only guarantee that a completion appears on
the completion ring if the thread has entered kernel mode since the
completion happened. So user-space polling of the completion ring can
cause unbounded delays.
If this is correct (it's not unreasonable, but should be documented),
then there should also be a simple way to force a kernel entry. But how
to do this using liburing? IIUC if I the following apply:
1. I have no pending sqes
2. There are pending completions
3. There is a completed event for which a completion has not been
appended to the completion queue ring
Then io_uring_wait_cqe() will elide io_uring_enter() and the
completed-but-not-reported event will be delayed.
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 32aeb2c581c5..59987dd212d8 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
static bool io_wq_worker_wake(struct io_worker *worker, void *data)
{
- set_notify_signal(worker->task);
+ set_notify_signal(worker->task, true);
wake_up_process(worker->task);
return false;
}
@@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
{
if (work && match->fn(work, match->data)) {
work->flags |= IO_WQ_WORK_CANCEL;
- set_notify_signal(worker->task);
+ set_notify_signal(worker->task, true);
return true;
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..ac1f14973e09 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -359,10 +359,10 @@ static inline void clear_notify_signal(void)
* Called to break out of interruptible wait loops, and enter the
* exit_to_user_mode_loop().
*/
-static inline void set_notify_signal(struct task_struct *task)
+static inline void set_notify_signal(struct task_struct *task, bool need_ipi)
{
if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
- !wake_up_state(task, TASK_INTERRUPTIBLE))
+ !wake_up_state(task, TASK_INTERRUPTIBLE) && need_ipi)
kick_process(task);
}
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..bff53f539933 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -367,7 +367,7 @@ static void klp_send_signals(void)
* Send fake signal to all non-kthread tasks which are
* still not migrated.
*/
- set_notify_signal(task);
+ set_notify_signal(task, true);
}
}
read_unlock(&tasklist_lock);
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..47d7024dc499 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -51,7 +51,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
set_notify_resume(task);
break;
case TWA_SIGNAL:
- set_notify_signal(task);
+ set_notify_signal(task, false);
break;
default:
WARN_ON_ONCE(1);