That's not final for a several reasons, but good enough for discussion. That brings io_kiocb down to 192B. I didn't try to benchmark it properly, but quick nop test gave +5% throughput increase. 7531 vs 7910 KIOPS with fio/t/io_uring The whole situation is obviously a bunch of tradeoffs. For instance, instead of shrinking it, we can inline apoll to speed apoll path. [2/2] just for a reference, I'm thinking about other ways to shrink it. e.g. ->link_list can be a single-linked list with linked tiemouts storing a back-reference. This can turn out to be better, because that would move ->fixed_file_refs to the 2nd cacheline, so we won't ever touch 3rd cacheline in the submission path. Any other ideas? note: on top of for-5.9/io_uring, f56040b819998 ("io_uring: deduplicate io_grab_files() calls") Pavel Begunkov (2): io_uring: allocate req->work dynamically io_uring: unionise ->apoll and ->work fs/io-wq.h | 1 + fs/io_uring.c | 207 ++++++++++++++++++++++++++------------------------ 2 files changed, 110 insertions(+), 98 deletions(-) -- 2.24.0