This series implements bio pcpu caching for normal / IRQ-driven I/O extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side still only works from non-irq context, which is the reason it's not enabled by default, but turning it on for other users (e.g. filesystems) is as a matter of passing a flag. t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests and +4.3% for batches of 8. IRQ, 128/32/32, cache off IOPS=59.08M, BW=28.84GiB/s, IOS/call=31/31 IOPS=59.30M, BW=28.96GiB/s, IOS/call=32/32 IOPS=59.97M, BW=29.28GiB/s, IOS/call=31/31 IOPS=59.92M, BW=29.26GiB/s, IOS/call=32/32 IOPS=59.81M, BW=29.20GiB/s, IOS/call=32/31 IRQ, 128/32/32, cache on IOPS=64.05M, BW=31.27GiB/s, IOS/call=32/31 IOPS=64.22M, BW=31.36GiB/s, IOS/call=32/32 IOPS=64.04M, BW=31.27GiB/s, IOS/call=31/31 IOPS=63.16M, BW=30.84GiB/s, IOS/call=32/32 IRQ, 32/8/8, cache off IOPS=50.60M, BW=24.71GiB/s, IOS/call=7/8 IOPS=50.22M, BW=24.52GiB/s, IOS/call=8/7 IOPS=49.54M, BW=24.19GiB/s, IOS/call=8/8 IOPS=50.07M, BW=24.45GiB/s, IOS/call=7/7 IOPS=50.46M, BW=24.64GiB/s, IOS/call=8/8 IRQ, 32/8/8, cache on IOPS=51.39M, BW=25.09GiB/s, IOS/call=8/7 IOPS=52.52M, BW=25.64GiB/s, IOS/call=7/8 IOPS=52.57M, BW=25.67GiB/s, IOS/call=8/8 IOPS=52.58M, BW=25.67GiB/s, IOS/call=8/7 IOPS=52.61M, BW=25.69GiB/s, IOS/call=8/8 The main part is in patch 3. Would be great to take patch 1 separately for 6.1 for extra safety. v2: fix botched splicing threshold checks Pavel Begunkov (4): bio: safeguard REQ_ALLOC_CACHE bio put bio: split pcpu cache part of bio_put into a helper block/bio: add pcpu caching for non-polling bio_put io_uring/rw: enable bio caches for IRQ rw block/bio.c | 94 ++++++++++++++++++++++++++++++++++++++++----------- io_uring/rw.c | 3 +- 2 files changed, 76 insertions(+), 21 deletions(-) -- 2.38.0