On 11/13/24 07:20, Christoph Hellwig wrote: > Hi Jens, > > currently blk-mq reorders requests when adding them to the plug because > the request list can't do efficient tail appends. When the plug is > directly issued using ->queue_rqs that means reordered requests are > passed to the driver, which can lead to very bad I/O patterns when > not corrected, especially on rotational devices (e.g. NVMe HDD) or > when using zone append. > > This series first adds two easily backportable workarounds to reverse > the reording in the virtio_blk and nvme-pci ->queue_rq implementations > similar to what the non-queue_rqs path does, and then adds a rq_list > type that allows for efficient tail insertions and uses that to fix > the reordering for real and then does the same for I/O completions as > well. Looks good to me. I ran the quick performance numbers [1]. Reviewed-by: Chaitanya Kulkarni <kch@xxxxxxxxxx> -ck fio randread iouring workload :- IOPS :- ------- nvme-orig: Average IOPS: 72,690 nvme-new-no-reorder: Average IOPS: 72,580 BW :- ------- nvme-orig: Average BW: 283.9 MiB/s nvme-new-no-reorder: Average BW: 283.4 MiB/s IOPS/BW :- nvme-orig-10.fio: read: IOPS=72.9k, BW=285MiB/s (299MB/s)(16.7GiB/60004msec) nvme-orig-1.fio: read: IOPS=72.7k, BW=284MiB/s (298MB/s)(16.6GiB/60004msec) nvme-orig-2.fio: read: IOPS=73.0k, BW=285MiB/s (299MB/s)(16.7GiB/60004msec) nvme-orig-3.fio: read: IOPS=73.3k, BW=286MiB/s (300MB/s)(16.8GiB/60003msec) nvme-orig-4.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60003msec) nvme-orig-5.fio: read: IOPS=72.4k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec) nvme-orig-6.fio: read: IOPS=72.9k, BW=285MiB/s (299MB/s)(16.7GiB/60003msec) nvme-orig-7.fio: read: IOPS=72.3k, BW=282MiB/s (296MB/s)(16.5GiB/60004msec) nvme-orig-8.fio: read: IOPS=72.4k, BW=283MiB/s (296MB/s)(16.6GiB/60003msec) nvme-orig-9.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec) nvme (nvme-6.13) # nvme (nvme-6.13) # grep BW nvme-new-no-reorder-*fio nvme-new-no-reorder-10.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec) nvme-new-no-reorder-1.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec) nvme-new-no-reorder-2.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60003msec) nvme-new-no-reorder-3.fio: read: IOPS=71.7k, BW=280MiB/s (294MB/s)(16.4GiB/60005msec) nvme-new-no-reorder-4.fio: read: IOPS=72.5k, BW=283MiB/s (297MB/s)(16.6GiB/60004msec) nvme-new-no-reorder-5.fio: read: IOPS=72.6k, BW=284MiB/s (298MB/s)(16.6GiB/60003msec) nvme-new-no-reorder-6.fio: read: IOPS=73.3k, BW=286MiB/s (300MB/s)(16.8GiB/60003msec) nvme-new-no-reorder-7.fio: read: IOPS=72.8k, BW=284MiB/s (298MB/s)(16.7GiB/60003msec) nvme-new-no-reorder-8.fio: read: IOPS=73.2k, BW=286MiB/s (300MB/s)(16.7GiB/60004msec) nvme-new-no-reorder-9.fio: read: IOPS=72.2k, BW=282MiB/s (296MB/s)(16.5GiB/60005msec)