For experiments only. If proves to be useful would need to make it nicer on the non-io_uring side. 0-10 save 1 spinlock/unlock_irq pair and 2 cmpxchg per batch. 11/11 in general trades 1 per tw add spin_lock/unlock_irq and 2 per batch spinlocking with 2 cmpxchg to 1 per tw add cmpxchg and 1 per batch cmpxchg. Pavel Begunkov (11): io_uring: optimise io_req_task_work_add io_uringg: add io_should_fail_tw() helper io_uring: ban tw queue for exiting processes io_uring: don't take ctx refs in tctx_task_work() io_uring: add dummy io_uring_task_work_run() task_work: add helper for signalling a task io_uring: run io_uring task_works on TIF_NOTIFY_SIGNAL io_uring: wire io_uring specific task work io_uring: refactor io_run_task_work() io_uring: remove priority tw list io_uring: lock-free task_work stack fs/io-wq.c | 1 + fs/io_uring.c | 213 +++++++++++++++----------------------- include/linux/io_uring.h | 4 + include/linux/task_work.h | 4 + kernel/entry/kvm.c | 1 + kernel/signal.c | 2 + kernel/task_work.c | 33 +++--- 7 files changed, 115 insertions(+), 143 deletions(-) -- 2.36.0