Hi all, there had been several attempts to implement a latency-based I/O scheduler for native nvme multipath, all of which had its issues. So time to start afresh, this time using the QoS framework already present in the block layer. It consists of two parts: - a new 'blk-nodelat' QoS module, which is just a simple per-node latency tracker - a 'latency' nvme I/O policy Using the 'tiobench' fio script I'm getting: WRITE: bw=531MiB/s (556MB/s), 33.2MiB/s-52.4MiB/s (34.8MB/s-54.9MB/s), io=4096MiB (4295MB), run=4888-7718msec WRITE: bw=539MiB/s (566MB/s), 33.7MiB/s-50.9MiB/s (35.3MB/s-53.3MB/s), io=4096MiB (4295MB), run=5033-7594msec READ: bw=898MiB/s (942MB/s), 56.1MiB/s-75.4MiB/s (58.9MB/s-79.0MB/s), io=4096MiB (4295MB), run=3397-4560msec READ: bw=1023MiB/s (1072MB/s), 63.9MiB/s-75.1MiB/s (67.0MB/s-78.8MB/s), io=4096MiB (4295MB), run=3408-4005msec for 'round-robin' and WRITE: bw=574MiB/s (601MB/s), 35.8MiB/s-45.5MiB/s (37.6MB/s-47.7MB/s), io=4096MiB (4295MB), run=5629-7142msec WRITE: bw=639MiB/s (670MB/s), 39.9MiB/s-47.5MiB/s (41.9MB/s-49.8MB/s), io=4096MiB (4295MB), run=5388-6408msec READ: bw=1024MiB/s (1074MB/s), 64.0MiB/s-73.7MiB/s (67.1MB/s-77.2MB/s), io=4096MiB (4295MB), run=3475-4000msec READ: bw=1013MiB/s (1063MB/s), 63.3MiB/s-72.6MiB/s (66.4MB/s-76.2MB/s), io=4096MiB (4295MB), run=3524-4042msec for 'latency' with 'decay' set to 10. That's on a 32G FC testbed running against a brd target, fio running with 16 thread. As usual, comments and reviews are welcome. Hannes Reinecke (2): block: track per-node I/O latency nvme: add 'latency' iopolicy block/Kconfig | 7 + block/Makefile | 1 + block/blk-mq-debugfs.c | 2 + block/blk-nodelat.c | 368 ++++++++++++++++++++++++++++++++++ block/blk-rq-qos.h | 6 + drivers/nvme/host/multipath.c | 46 ++++- drivers/nvme/host/nvme.h | 2 + include/linux/blk-mq.h | 11 + 8 files changed, 439 insertions(+), 4 deletions(-) create mode 100644 block/blk-nodelat.c -- 2.35.3