Task work currently uses a spin lock to guard task_list and task_running. Some use cases such as networking can trigger task_work_add from multiple threads all at once, which suffers from contention here. This can be changed to use a lockless list which seems to have better performance. Running the micro benchmark in [1] I see 20% improvment in multithreaded task work add. It required removing the priority tw list optimisation, however it isn't clear how important that optimisation is. Additionally it has fairly easy to break semantics. Patch 1-2 remove the priority tw list optimisation Patch 3-5 add lockless lists for task work Patch 6 fixes a bug I noticed in io_uring event tracing Patch 7-8 adds tracing for task_work_run v2 changes: - simplify comparison in handle_tw_list Dylan Yudaken (8): io_uring: remove priority tw list optimisation io_uring: remove __io_req_task_work_add io_uring: lockless task list io_uring: introduce llist helpers io_uring: batch task_work io_uring: move io_uring_get_opcode out of TP_printk io_uring: add trace event for running task work io_uring: trace task_work_run include/linux/io_uring_types.h | 2 +- include/trace/events/io_uring.h | 72 +++++++++++++-- io_uring/io_uring.c | 149 ++++++++++++-------------------- io_uring/io_uring.h | 1 - io_uring/rw.c | 2 +- io_uring/tctx.c | 4 +- io_uring/tctx.h | 7 +- 7 files changed, 126 insertions(+), 111 deletions(-) base-commit: 7b411672f03db4aa05dec1c96742fc02b99de3d4 -- 2.30.2