On Sat, Feb 11, 2017 at 2:03 PM, Avri Altman <Avri.Altman@xxxxxxxxxxx> wrote: >> >> The iozone results seem a bit consistent and all values seem to be noisy and >> not say much. I don't know why really, maybe the test is simply not relevant, >> the tests don't seem to be significantly affected by any of the patches, so >> let's focus on the dd and find tests. > > Maybe use a more selective testing mode instead of -az. > Also maybe you want to clear the cache between the sequential and random tests: > #sync > #echo 3 > /proc/sys/vm/drop_caches > #sync > It helps to obtain a more robust results. OK I'll try that. I actually cold booted the system between each test to avoid cache effects. >> What immediately jumps out at you is that linear read/writes perform just as >> nicely or actually better with MQ than with the old block layer. > > How come 22.7MB/s before vs. 22.1MB/s after is better? or did I misunderstand the output? > Also as dd is probably using the buffer cache, unlike the iozone test in which you properly used -I > for direct mode to isolate the blk-mq effect - does it really say much? Sorry I guess I was a bit too enthusiastic there. The difference is in the error margin, it is just based on a single test. I guess I should re-run them with a few iterations, then drop caches iterate drop caches iterate and get some more stable figures. We need to understand what is meant by "better" too: quicker compared to wall clock time (real), user or sys. So for the dd command: real user sys Before patches: 45.13 0.02 7.60 Move asynch pp 52.17 0.01 6.96 Issue in parallel 49.31 0.00 7.11 Multiqueue 46.25 0.03 6.42 For these pure kernel patches only the last figure (sys) is really relevant IIUC. The other figures are just system noise, but still the eventual throughput figure from dd is including the time spent on other processes in the system etc, so that value is not relevant. But I guess Paolo may need to beat me up a bit here: what the user percieves in the end if of course the most relevant for any human ... Nevertheless if we just look at sys then MQ is already winning this test. I just think there is too little tested here. I think 1GiB is maybe too little. Maybe I need to read the entire card a few times or something? Since dd is just using sequenctially blocks from mmcblk0 on a cold booted system I think the buffer cache is empty except for maybe the partition table blocks. But I dunno. I will use your trick the next time to drop caches. Yours, Linus Walleij