Hi, This is v4 of this patchset, and Yet Another method of achieving the same goal. This one moves into the direction of my old cpu-alloc-cache branch, where the caches are just per-cpu. The trouble with those is that we need to make this specific to polled IO to lose the IRQ safety of them, otherwise it's not a real win and we're better off just using the slab allocator smarts. This is combined with Christoph's idea to make it per bio_set, and retains the flagging of the kiocb for having the IO issuer tell the below layer whether the cache can be safely used or not. Another change from last is that we can now grossly simplify the io_uring side, as we don't need locking for the cache and async retries are no longer interesting there. This is combined with a block layer change that clears BIO_PERCPU_CACHE if we clear the HIPRI flag. The tldr; here is that we get about a 10% bump in polled performance with this patchset, as we can recycle bio structures essentially for free. Outside of that, explanations in each patch. I've also got an iomap patch, but trying to keep this single user until there's agreement on the direction. Against for-5.15/io_uring, and can also be found in my io_uring-bio-cache.4 branch. block/bio.c | 170 +++++++++++++++++++++++++++++++++---- block/blk-core.c | 5 +- fs/block_dev.c | 6 +- fs/io_uring.c | 2 +- include/linux/bio.h | 23 +++-- include/linux/blk_types.h | 1 + include/linux/cpuhotplug.h | 1 + include/linux/fs.h | 2 + 8 files changed, 185 insertions(+), 25 deletions(-) -- Jens Axboe