On 11/13/24 1:36 PM, Chaitanya Kulkarni wrote: > On 11/13/24 07:20, Christoph Hellwig wrote: >> Hi Jens, >> >> currently blk-mq reorders requests when adding them to the plug because >> the request list can't do efficient tail appends. When the plug is >> directly issued using ->queue_rqs that means reordered requests are >> passed to the driver, which can lead to very bad I/O patterns when >> not corrected, especially on rotational devices (e.g. NVMe HDD) or >> when using zone append. >> >> This series first adds two easily backportable workarounds to reverse >> the reording in the virtio_blk and nvme-pci ->queue_rq implementations >> similar to what the non-queue_rqs path does, and then adds a rq_list >> type that allows for efficient tail insertions and uses that to fix >> the reordering for real and then does the same for I/O completions as >> well. > > Looks good to me. I ran the quick performance numbers [1]. > > Reviewed-by: Chaitanya Kulkarni <kch@xxxxxxxxxx> > > -ck > > fio randread iouring workload :- > > IOPS :- > ------- > nvme-orig: Average IOPS: 72,690 > nvme-new-no-reorder: Average IOPS: 72,580 > > BW :- > ------- > nvme-orig: Average BW: 283.9 MiB/s > nvme-new-no-reorder: Average BW: 283.4 MiB/s Thanks for testing, but you can't verify any kind of perf change with that kind of setup. I'll be willing to bet that it'll be 1-2% drop at higher rates, which is substantial. But the reordering is a problem, not just for zoned devices, which is why I chose to merge this. -- Jens Axboe