Hello Jens, As you know all existing single queue block drivers have to be converted to blk-mq before the single queue block layer can be removed. Hence this patch series that converts the skd (sTec s1120) driver to blk-mq. As the following performance numbers show, this patch series does not affect performance of the skd driver significantly: ====================================================================== sTec Measurements =================== Kernel module configuration ........................... $ cat /etc/modprobe.d/skd.conf options skd skd_max_queue_depth=200 skd_isr_type=1 blk-sq driver ............. Kernel: 4.11.10-300.fc26.x86_64 $ (cd /sys/block/skd*/queue && grep -aH '' add_random hw_sector_size max_segments nr_requests rotational rq_affinity scheduler write_cache) add_random:0 hw_sector_size:512 max_segments:256 nr_requests:128 rotational:0 rq_affinity:2 scheduler:[noop] deadline cfq write_cache:write back $ ~bart/software/tools/measure-latency /dev/skd* 512 |& tee measurements.txt I/O pattern: randread lat (usec): min=16, max=550, avg=88.33, stdev=14.85 I/O pattern: randwrite lat (usec): min=20, max=5096, avg=26.35, stdev=56.03 $ for opt in "" "-w"; do for s in 512 4096 65536; do \ ~bart/software/tools/max-iops $opt -b$s -j1 /dev/skd*; done; done |& tee measurements.txt read: IOPS=103k, BW=50.1MiB/s (52.6MB/s)(3006MiB/60002msec) read: IOPS=81.4k, BW=318MiB/s (333MB/s)(18.7GiB/60003msec) read: IOPS=15.7k, BW=978MiB/s (1026MB/s)(57.4GiB/60015msec) write: IOPS=62.4k, BW=30.5MiB/s (31.1MB/s)(1826MiB/60004msec) write: IOPS=68.8k, BW=266MiB/s (279MB/s)(15.6GiB/60004msec) write: IOPS=13.9k, BW=818MiB/s (858MB/s)(47.1GiB/60012msec) blk-mq driver ............. Kernel: 4.13.0-rc2+ $ uname -r 4.13.0-rc2+ $ (cd /sys/block/skd*/queue && grep -aH '' add_random hw_sector_size max_segments nr_requests rotational rq_affinity scheduler write_cache) add_random:0 hw_sector_size:512 max_segments:256 nr_requests:100 rotational:0 rq_affinity:2 scheduler:[none] write_cache:write back $ ~bart/software/tools/measure-latency /dev/skd* 512 |& tee measurements.txt I/O pattern: randread lat (usec): min=18, max=297, avg=91.02, stdev=13.16 I/O pattern: randwrite lat (usec): min=20, max=4680, avg=26.96, stdev=54.80 $ for opt in "" "-w"; do for s in 512 4096 65536; do \ ~bart/software/tools/max-iops $opt -b$s -j1 /dev/skd*; done; done |& tee measurements.txt read: IOPS=101k, BW=49.4MiB/s (51.8MB/s)(2959MiB/60002msec) read: IOPS=83.3k, BW=325MiB/s (341MB/s)(19.6GiB/60003msec) read: IOPS=15.7k, BW=977MiB/s (1024MB/s)(57.3GiB/60019msec) write: IOPS=63.2k, BW=30.8MiB/s (32.3MB/s)(1846MiB/60003msec) write: IOPS=70.3k, BW=274MiB/s (288MB/s)(16.9GiB/60003msec) write: IOPS=13.2k, BW=823MiB/s (863MB/s)(48.3GiB/60012msec) ====================================================================== Please consider this patch series for kernel v4.14. Thanks, Bart. Bart Van Assche (55): block: Relax a check in blk_start_queue() skd: Avoid that module unloading triggers a use-after-free skd: Submit requests to firmware before triggering the doorbell skd: Switch to GPLv2 skd: Update maintainer information skd: Remove unneeded #include directives skd: Remove ESXi code skd: Remove unnecessary blank lines skd: Avoid that gcc 7 warns about fall-through when building with W=1 skd: Fix spelling in a source code comment skd: Fix a function name in a comment skd: Remove set-but-not-used local variables skd: Remove a set-but-not-used variable from struct skd_device skd: Remove useless barrier() calls skd: Switch from the pr_*() to the dev_*() logging functions skd: Fix endianness annotations skd: Document locking assumptions skd: Introduce the symbolic constant SKD_MAX_REQ_PER_MSG skd: Introduce SKD_SKCOMP_SIZE skd: Fix size argument in skd_free_skcomp() skd: Reorder the code in skd_process_request() skd: Simplify the code for deciding whether or not to send a FIT msg skd: Simplify the code for allocating DMA message buffers skd: Use a structure instead of hardcoding structure offsets skd: Check structure sizes at build time skd: Use __packed only when needed skd: Make the skd_isr() code more brief skd: Use ARRAY_SIZE() where appropriate skd: Simplify the code for handling data direction skd: Remove superfluous initializations from skd_isr_completion_posted() skd: Drop second argument of skd_recover_requests() skd: Use for_each_sg() skd: Remove a redundant init_timer() call skd: Remove superfluous occurrences of the 'volatile' keyword skd: Use kcalloc() instead of kzalloc() with multiply skb: Use symbolic names for SCSI opcodes skd: Move a function definition skd: Rework request failing code path skd: Convert explicit skd_request_fn() calls skd: Remove SG IO support skd: Remove dead code skd: Initialize skd_special_context.req.n_sg to one skd: Enable request tags for the block layer queue skd: Convert several per-device scalar variables into atomics skd: Introduce skd_process_request() skd: Split skd_recover_requests() skd: Move skd_free_sg_list() up skd: Coalesce struct request and struct skd_request_context skd: Convert to blk-mq skd: Switch to block layer timeout mechanism skd: Remove skd_device.in_flight skd: Reduce memory usage skd: Remove several local variables skd: Optimize locking skd: Bump driver version MAINTAINERS | 6 + block/blk-core.c | 2 +- drivers/block/skd_main.c | 3196 ++++++++++++--------------------------------- drivers/block/skd_s1120.h | 38 +- 4 files changed, 846 insertions(+), 2396 deletions(-) -- 2.14.0