On 02/11/16 11:30, Ritesh Harjani wrote: > Hi Adrian, > > Reply got delayed due to festive season holidays here. > > On 10/28/2016 6:18 PM, Adrian Hunter wrote: >> On 28/10/16 14:48, Ritesh Harjani wrote: >>> Hi Adrian, >>> >>> >>> Thanks for the re-base. I am trying to apply this patch series and validate. >>> I am seeing some tuning related errors, it could be due to my setup/device. >>> I will be working on this. >>> >>> >>> Meanwhile below perf readout was showing some discrepancy (could be that I >>> am missing something). >>> In your last test both sequential and random read gives almost same >>> throughput and even the percentage increase is similar(legacy v/s SW cmdq). >>> Could be due to benchmark itself ? >> >> Normally sequential reading would result in large transfers due to >> readahead. But the test is using Direct-I/O (-I option) to prevent VFS >> caching distorting the results. So the sequential read case ends up being >> more like a random read case anyway. >> >>> >>> Do you think we should get some other benchmarks tested as well? >>> (may be tio?) >>> >>> >>> Also could you please also add some analysis on where we should expect to >>> see the score improvement in SW CMDQ and where we may see some decrements >>> due to SW CMDQ? >> >> I would expect random reads to be improved, but there is the overhead of the >> extra cmdq commands, so a small decrease in performance would be expected >> otherwise. >> >>> You did mention in original cover-letter that we mostly should see >>> improvement in Random read scores, but below here we are seeing similar or >>> higher improvement in sequential reads(4 threads). Which is a bit >>> surprising. >> >> As explained above: the test is called "sequential" but the limited record >> size combined with Direct I/O makes it more like random I/O especially when >> it is mixed from multiple threads. > In that case if we see sequential read v/s random read for single thread of > SWCMDQ, both show the same data. Both SR and RR should not be same ideally no? Yes, with these iozone tests, SR and RR seem the same. > > >> >>> >>> >>> >>> >>> On 10/24/2016 2:07 PM, Adrian Hunter wrote: >>>> Hi >>>> >>>> Here is an updated version of the Software Command Queuing patches, >>>> re-based on next, with patches 1-5 dropped because they have been applied, >>>> and 2 fixes that have been rolled in (refer Changes in V5 below). >>>> >>>> Performance results: >>>> >>>> Results can vary from run to run, but here are some results showing 1, 2 >>>> or 4 >>>> processes with 4k and 32k record sizes. They show up to 40% improvement in >>>> read performance when there are multiple processes. >>>> >>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 1 -F /mnt/mmc/iozone1.tmp >>>> >>>> Children see throughput for 1 initial writers = 27909.87 >>>> kB/sec 24204.14 kB/sec -13.28 % >>>> Children see throughput for 1 rewriters = 28839.28 kB/sec >>>> 25531.92 kB/sec -11.47 % >>>> Children see throughput for 1 readers = 25889.65 >>>> kB/sec 24883.23 kB/sec -3.89 % >>>> Children see throughput for 1 re-readers = 25558.23 kB/sec >>>> 24679.89 kB/sec -3.44 % >>>> Children see throughput for 1 random readers = 25571.48 >>>> kB/sec 24689.52 kB/sec -3.45 % >>>> Children see throughput for 1 mixed workload = 25758.59 >>>> kB/sec 24487.52 kB/sec -4.93 % >>>> Children see throughput for 1 random writers = 24787.51 >>>> kB/sec 19368.99 kB/sec -21.86 % >>>> >>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 1 -F /mnt/mmc/iozone1.tmp >>>> >>>> Children see throughput for 1 initial writers = 91344.61 >>>> kB/sec 102008.56 kB/sec 11.67 % >>>> Children see throughput for 1 rewriters = 87932.36 kB/sec >>>> 96630.44 kB/sec 9.89 % >>>> Children see throughput for 1 readers = 134879.82 >>>> kB/sec 110292.79 kB/sec -18.23 % >>>> Children see throughput for 1 re-readers = 147632.13 kB/sec >>>> 109053.33 kB/sec -26.13 % >>>> Children see throughput for 1 random readers = 93547.37 >>>> kB/sec 112225.50 kB/sec 19.97 % >>>> Children see throughput for 1 mixed workload = 93560.04 >>>> kB/sec 110515.21 kB/sec 18.12 % >>>> Children see throughput for 1 random writers = 92841.84 >>>> kB/sec 81153.81 kB/sec -12.59 % >>>> >>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 2 -F /mnt/mmc/iozone1.tmp >>>> /mnt/mmc/iozone2.tmp >>>> >>>> Children see throughput for 2 initial writers = 31145.43 >>>> kB/sec 33771.25 kB/sec 8.43 % >>>> Children see throughput for 2 rewriters = 30592.57 kB/sec >>>> 35916.46 kB/sec 17.40 % >>>> Children see throughput for 2 readers = 31669.83 >>>> kB/sec 37460.13 kB/sec 18.28 % >>>> Children see throughput for 2 re-readers = 32079.94 kB/sec >>>> 37373.33 kB/sec 16.50 % >>>> Children see throughput for 2 random readers = 27731.19 >>>> kB/sec 37601.65 kB/sec 35.59 % >>>> Children see throughput for 2 mixed workload = 13927.50 >>>> kB/sec 14617.06 kB/sec 4.95 % >>>> Children see throughput for 2 random writers = 31250.00 >>>> kB/sec 33106.72 kB/sec 5.94 % >>>> >>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 2 -F /mnt/mmc/iozone1.tmp >>>> /mnt/mmc/iozone2.tmp >>>> >>>> Children see throughput for 2 initial writers = 123255.84 >>>> kB/sec 131252.22 kB/sec 6.49 % >>>> Children see throughput for 2 rewriters = 115234.91 kB/sec >>>> 107225.74 kB/sec -6.95 % >>>> Children see throughput for 2 readers = 128921.86 >>>> kB/sec 148562.71 kB/sec 15.23 % >>>> Children see throughput for 2 re-readers = 127815.24 kB/sec >>>> 149304.32 kB/sec 16.81 % >>>> Children see throughput for 2 random readers = 125600.46 >>>> kB/sec 148406.56 kB/sec 18.16 % >>>> Children see throughput for 2 mixed workload = 44006.94 >>>> kB/sec 50937.36 kB/sec 15.75 % >>>> Children see throughput for 2 random writers = 120623.95 >>>> kB/sec 103969.05 kB/sec -13.81 % >>>> >>>> iozone -s 8192k -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /mnt/mmc/iozone1.tmp >>>> /mnt/mmc/iozone2.tmp /mnt/mmc/iozone3.tmp /mnt/mmc/iozone4.tmp >>>> >>>> Children see throughput for 4 initial writers = 24100.96 >>>> kB/sec 33336.58 kB/sec 38.32 % >>>> Children see throughput for 4 rewriters = 31650.20 kB/sec >>>> 33091.53 kB/sec 4.55 % >>>> Children see throughput for 4 readers = 33276.92 >>>> kB/sec 41799.89 kB/sec 25.61 % >>>> Children see throughput for 4 re-readers = 31786.96 kB/sec >>>> 41501.74 kB/sec 30.56 % >>>> Children see throughput for 4 random readers = 31991.65 >>>> kB/sec 40973.93 kB/sec 28.08 % >>>> Children see throughput for 4 mixed workload = 15804.80 >>>> kB/sec 13581.32 kB/sec -14.07 % >>>> Children see throughput for 4 random writers = 31231.42 >>>> kB/sec 34537.03 kB/sec 10.58 % >>>> >>>> iozone -s 8192k -r 32k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /mnt/mmc/iozone1.tmp >>>> /mnt/mmc/iozone2.tmp /mnt/mmc/iozone3.tmp /mnt/mmc/iozone4.tmp >>>> >>>> Children see throughput for 4 initial writers = 116567.42 >>>> kB/sec 119280.35 kB/sec 2.33 % >>>> Children see throughput for 4 rewriters = 115010.96 kB/sec >>>> 120864.34 kB/sec 5.09 % >>>> Children see throughput for 4 readers = 130700.29 >>>> kB/sec 177834.21 kB/sec 36.06 % >>> Do you think sequential read will increase more that of random read. It >>> should mostly benefit in random reads right. Any idea why it's behaving >>> differently here? >> >> Explained above. >> >>> >>> >>>> Children see throughput for 4 re-readers = 125392.58 kB/sec >>>> 175975.28 kB/sec 40.34 % >>>> Children see throughput for 4 random readers = 132194.57 >>>> kB/sec 176630.46 kB/sec 33.61 % >>>> Children see throughput for 4 mixed workload = 56464.98 >>>> kB/sec 54140.61 kB/sec -4.12 % >>>> Children see throughput for 4 random writers = 109128.36 >>>> kB/sec 85359.80 kB/sec -21.78 % >>> Similarly, we don't expect random write scores to decrease here. Do you know >>> why this could be the case here? >> >> There is a lot of variation from one run to another > > These run-to-run variations are due to what, any idea? Not really, but there is no knowing how much internal work the eMMC is doing, wear-leveling, erasing blocks, cache flushing etc. Also "random" writes that hit the same erase block can be combined when flushing the cache, and random writes to the same sectors can also be combined - so the potential for variation is very large. > 21% variations is huge. > > Actually can we try and get some data which shows the difference between > SeqR and RandR and also RandW should not decrease this much. > May be by changing iozone parameters? > > > How about below data ? Yes that would be interesting > > For Sequential > ./iozone -s 64m -r 32m -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /data/mmc/iozone1.tmp > /data/mmc/iozone2.tmp /data/mmc/iozone3.tmp /data/mmc/iozone4.tmp > > For random - > ./iozone -s 64m -r 4k -i 0 -i 1 -i 2 -i 8 -I -t 4 -F /data/mmc/iozone1.tmp > /data/mmc/iozone2.tmp /data/mmc/iozone3.tmp /data/mmc/iozone4.tmp > > > Regards > Ritesh > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html