Re: [PATCH for-next] block: fix hctx checks for batch allocation

Jens Axboe <axboe@xxxxxxxxx> · Tue, 17 Jan 2023 09:56:42 -0700

On 1/17/23 4:42?AM, Pavel Begunkov wrote:
> When there are no read queues read requests will be assigned a
> default queue on allocation. However, blk_mq_get_cached_request() is not
> prepared for that and will fail all attempts to grab read requests from
> the cache. Worst case it doubles the number of requests allocated,
> roughly half of which will be returned by blk_mq_free_plug_rqs().
> 
> It only affects batched allocations and so is io_uring specific.
> For reference, QD8 t/io_uring benchmark improves by 20-35%.

This does make a big difference for me. Usual peak test (24 drives), and
I get 63-65M IOPS with IRQ based IO. With the patch:

polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=64.79M, BW=31.64GiB/s, IOS/call=32/31
IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32
IOPS=73.70M, BW=35.99GiB/s, IOS/call=31/31
IOPS=74.57M, BW=36.41GiB/s, IOS/call=31/31
IOPS=75.18M, BW=36.71GiB/s, IOS/call=31/31
IOPS=74.33M, BW=36.29GiB/s, IOS/call=32/32
IOPS=74.53M, BW=36.39GiB/s, IOS/call=32/32
IOPS=74.61M, BW=36.43GiB/s, IOS/call=32/32

which is 15-19% better.

> It might be a good idea to always use HCTX_TYPE_DEFAULT, so the cache
> always can accomodate combined write with read reqs.

I think it makes sense to do so, particularly now that we have support
for not just polled IO.

-- 
Jens Axboe