Hi, This patchset gets rid of using llist for handling task_work, for both local and normal task_work. Instead of a lockless list, a normal io_wq_work_list is used and protected by a spinlock. I've done some benchmarking with this, and I only see wins with this - the act of adding or iterating task_work is the same cost, but we get rid of the need to reverse the task_work list, which can be substantial for bursty applications or just generally busy task_work usages. Patch 2 implements io_wq_work_list handling for deferred task_work, and patch 4 does the same for normal task_work. Patch 6 then also switches SQPOLL to use this scheme, which eliminates the passing around of io_wq_work_node for its retry logic. Outside of cleaning up this code, it also enables us to potentially implement task_work run capping for normal task_work in the future. Git tree can be found here: https://git.kernel.dk/cgit/linux/log/?h=io_uring-defer-tw include/linux/io_uring_types.h | 17 +- io_uring/io_uring.c | 293 ++++++++++++++++++--------------- io_uring/io_uring.h | 20 ++- io_uring/slist.h | 16 ++ io_uring/sqpoll.c | 20 ++- io_uring/tctx.c | 3 +- 6 files changed, 211 insertions(+), 158 deletions(-) -- Jens Axboe