Hi, This is v3 of this patchset. We're back to passing the cache pointer in the kiocb, I do think that's the cleanest and it's also the most efficient approach. A patch has been added to remove a member from the io_uring req_rw structure, so that the kiocb size bump doesn't result in the per-command part of io_kiocb to bump into the next cacheline. Another benefit of this approach is that we get per-ring caching. That means if an application splits polled IO into two threads, one doing submit and one doing reaps, then we still get the full benefit of the bio caching. The tldr; here is that we get about a 10% bump in polled performance with this patchset, as we can recycle bio structures essentially for free. Outside of that, explanations in each patch. I've also got an iomap patch, but trying to keep this single user until there's agreement on the direction. Against for-5.15/io_uring, and can also be found in my io_uring-bio-cache.3 branch. block/bio.c | 123 ++++++++++++++++++++++++++++++++++++++++---- fs/block_dev.c | 32 ++++++++++-- fs/io_uring.c | 67 +++++++++++++++++++++--- include/linux/bio.h | 24 +++++++-- include/linux/fs.h | 11 ++++ 5 files changed, 229 insertions(+), 28 deletions(-) -- Jens Axboe