On 1/17/23 4:42?AM, Pavel Begunkov wrote: > When there are no read queues read requests will be assigned a > default queue on allocation. However, blk_mq_get_cached_request() is not > prepared for that and will fail all attempts to grab read requests from > the cache. Worst case it doubles the number of requests allocated, > roughly half of which will be returned by blk_mq_free_plug_rqs(). > > It only affects batched allocations and so is io_uring specific. > For reference, QD8 t/io_uring benchmark improves by 20-35%. This does make a big difference for me. Usual peak test (24 drives), and I get 63-65M IOPS with IRQ based IO. With the patch: polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=64.79M, BW=31.64GiB/s, IOS/call=32/31 IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32 IOPS=73.70M, BW=35.99GiB/s, IOS/call=31/31 IOPS=74.57M, BW=36.41GiB/s, IOS/call=31/31 IOPS=75.18M, BW=36.71GiB/s, IOS/call=31/31 IOPS=74.33M, BW=36.29GiB/s, IOS/call=32/32 IOPS=74.53M, BW=36.39GiB/s, IOS/call=32/32 IOPS=74.61M, BW=36.43GiB/s, IOS/call=32/32 which is 15-19% better. > It might be a good idea to always use HCTX_TYPE_DEFAULT, so the cache > always can accomodate combined write with read reqs. I think it makes sense to do so, particularly now that we have support for not just polled IO. -- Jens Axboe