On Tue, Oct 24, 2017 at 10:40 AM, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > Here is V12 of the hardware command queue patches without the software > command queue patches, now using blk-mq and now with blk-mq support for > non-CQE I/O. Since I had my test setup going I gave this a spin with the same set of tests that I used before/after my MQ patches. It is using the same setup and same eMMC, but I hade to rebase onto Ulf's very latest next branch to apply your patches. I default-enabled multiqueue. Results: sync echo 3 > /proc/sys/vm/drop_caches sync time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.0GB) copied, 24.251922 seconds, 42.2MB/s real 0m 24.25s user 0m 0.03s sys 0m 3.80s mount /dev/mmcblk3p1 /mnt/ cd /mnt/ sync echo 3 > /proc/sys/vm/drop_caches sync time find . > /dev/null real 0m 3.24s user 0m 0.22s sys 0m 1.23s sync echo 3 > /proc/sys/vm/drop_caches sync iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test random random kB reclen write rewrite read reread read write 20480 4 1615 1571 6612 6714 6494 531 20480 8 2143 2295 11559 11563 11499 1164 20480 16 3894 4202 17826 17823 17755 1369 20480 32 5816 7489 23741 23759 23709 3016 20480 64 7393 9167 27532 27526 27502 3591 20480 128 7328 8097 29184 29161 29159 5592 20480 256 7194 8752 29424 29434 29424 6700 20480 512 8984 9930 29903 29911 29909 7420 20480 1024 7072 7446 27684 27685 27681 7444 20480 2048 6840 8199 27398 27420 27418 6766 20480 4096 8137 6805 28091 28089 28093 8209 20480 8192 7255 7485 28386 28384 28383 7479 20480 16384 7078 7448 28584 28585 28585 7447 In short: no performance regressions. Performance-wise this is on par with my own patch set for MQ. As you know my pet peeve is "enable MQ by default" and I see no reason from a performance perspective not to enable MQ by default on this patch set or mine for that matter. > While we should look at changing blk-mq to give better workqueue performance, > a bigger gain is likely to be made by adding a new host API to enable the > next already-prepared request to be issued directly from within ->done() > callback of the current request. My patch series switches the stack around to make it possible to do this. But it doesn't go the whole way to complete the requests from interrupt context. Since we have to send commands for retune etc request finalization cannot easily be done from interrupt context. But I am thinking about testing to hack it using some ugly approaches ... like assuming we don't need any retune etc and just say all is fine and optimistically complete the request directly in the interrupt handler if all was OK and wait for errors to happen before retuning. Yours, Linus Walleij