On Mon, Jun 30, 2014 at 05:47:59PM +0100, Mel Gorman wrote: > Changelog since V3 > o Push down kwapd changes to cover the balance gap > o Drop drop page distribution patch > > Changelog since V2 > o Simply fair zone policy cost reduction > o Drop CFQ patch > > Changelog since v1 > o Rebase to v3.16-rc2 > o Move CFQ patch to end of series where it can be rejected easier if necessary > o Introduce page-reclaim related patch related to kswapd/fairzone interactions > o Rework fast zone policy patch > > IO performance since 3.0 has been a mixed bag. In many respects we are > better and in some we are worse and one of those places is sequential > read throughput. This is visible in a number of benchmarks but I looked > at tiobench the closest. This is using ext3 on a mid-range desktop and > the series applied. > > 3.16.0-rc2 3.0.0 3.16.0-rc2 > vanilla vanilla fairzone-v4r5 > Min SeqRead-MB/sec-1 120.92 ( 0.00%) 133.65 ( 10.53%) 140.68 ( 16.34%) > Min SeqRead-MB/sec-2 100.25 ( 0.00%) 121.74 ( 21.44%) 118.13 ( 17.84%) > Min SeqRead-MB/sec-4 96.27 ( 0.00%) 113.48 ( 17.88%) 109.84 ( 14.10%) > Min SeqRead-MB/sec-8 83.55 ( 0.00%) 97.87 ( 17.14%) 89.62 ( 7.27%) > Min SeqRead-MB/sec-16 66.77 ( 0.00%) 82.59 ( 23.69%) 70.49 ( 5.57%) > > Overall system CPU usage is reduced > > 3.16.0-rc2 3.0.0 3.16.0-rc2 > vanilla vanilla fairzone-v4 > User 390.13 251.45 396.13 > System 404.41 295.13 389.61 > Elapsed 5412.45 5072.42 5163.49 > > This series does not fully restore throughput performance to 3.0 levels > but it brings it close for lower thread counts. Higher thread counts are > known to be worse than 3.0 due to CFQ changes but there is no appetite > for changing the defaults there. I ran tiobench locally and here are the results: tiobench MB/sec 3.16-rc1 3.16-rc1 seqreadv4r8 Mean SeqRead-MB/sec-1 129.66 ( 0.00%) 156.16 ( 20.44%) Mean SeqRead-MB/sec-2 115.74 ( 0.00%) 138.50 ( 19.66%) Mean SeqRead-MB/sec-4 110.21 ( 0.00%) 127.08 ( 15.31%) Mean SeqRead-MB/sec-8 101.70 ( 0.00%) 108.47 ( 6.65%) Mean SeqRead-MB/sec-16 86.45 ( 0.00%) 91.57 ( 5.92%) Mean RandRead-MB/sec-1 1.14 ( 0.00%) 1.11 ( -2.35%) Mean RandRead-MB/sec-2 1.30 ( 0.00%) 1.25 ( -3.85%) Mean RandRead-MB/sec-4 1.50 ( 0.00%) 1.46 ( -2.23%) Mean RandRead-MB/sec-8 1.72 ( 0.00%) 1.60 ( -6.96%) Mean RandRead-MB/sec-16 1.72 ( 0.00%) 1.69 ( -2.13%) Seqread throughput is up, randread takes a small hit. But allocation latency is badly screwed at higher concurrency levels: tiobench Maximum Latency 3.16-rc1 3.16-rc1 seqreadv4r8 Mean SeqRead-MaxLatency-1 77.23 ( 0.00%) 57.69 ( 25.30%) Mean SeqRead-MaxLatency-2 228.80 ( 0.00%) 218.50 ( 4.50%) Mean SeqRead-MaxLatency-4 329.58 ( 0.00%) 325.93 ( 1.11%) Mean SeqRead-MaxLatency-8 485.13 ( 0.00%) 475.35 ( 2.02%) Mean SeqRead-MaxLatency-16 599.10 ( 0.00%) 637.89 ( -6.47%) Mean RandRead-MaxLatency-1 66.98 ( 0.00%) 18.21 ( 72.81%) Mean RandRead-MaxLatency-2 132.88 ( 0.00%) 119.61 ( 9.98%) Mean RandRead-MaxLatency-4 222.95 ( 0.00%) 213.82 ( 4.10%) Mean RandRead-MaxLatency-8 982.99 ( 0.00%) 1009.71 ( -2.72%) Mean RandRead-MaxLatency-16 515.24 ( 0.00%) 1883.82 (-265.62%) Mean SeqWrite-MaxLatency-1 239.78 ( 0.00%) 233.61 ( 2.57%) Mean SeqWrite-MaxLatency-2 517.85 ( 0.00%) 413.39 ( 20.17%) Mean SeqWrite-MaxLatency-4 249.10 ( 0.00%) 416.33 (-67.14%) Mean SeqWrite-MaxLatency-8 629.31 ( 0.00%) 851.62 (-35.33%) Mean SeqWrite-MaxLatency-16 987.05 ( 0.00%) 1080.92 ( -9.51%) Mean RandWrite-MaxLatency-1 0.01 ( 0.00%) 0.01 ( 0.00%) Mean RandWrite-MaxLatency-2 0.02 ( 0.00%) 0.02 ( 0.00%) Mean RandWrite-MaxLatency-4 0.02 ( 0.00%) 0.02 ( 0.00%) Mean RandWrite-MaxLatency-8 1.83 ( 0.00%) 1.96 ( -6.73%) Mean RandWrite-MaxLatency-16 1.52 ( 0.00%) 1.33 ( 12.72%) Zone fairness is completely gone. The overall allocation distribution on this system goes from 40%/60% to 10%/90%, and during the workload the DMA32 zone is not used *at all*: 3.16-rc1 3.16-rc1 seqreadv4r8 Zone normal velocity 11358.492 17996.733 Zone dma32 velocity 8213.852 0.000 Both negative effects stem from kswapd suddenly ignoring the classzone index while the page allocator respects it: the page allocator will keep the low wmark + lowmem reserves in DMA32 free, but kswapd won't reclaim in there until it drops down to the high watermark. The low watermark + lowmem reserve is usually bigger than the high watermark, so you effectively disable kswapd service in DMA32 for user requests. The zone is then no longer used until it fills with enough kernel pages to trigger kswapd, or the workload goes into direct reclaim. The classzone change is a non-sensical change IMO, and there is no useful description of it to be found in the changelog. But for the given tests it appears to be the only change in the entire series to make a measurable difference; reverting it gets me back to baseline: tiobench MB/sec 3.16-rc1 3.16-rc1 3.16-rc1 seqreadv4r8 seqreadv4r8classzone Mean SeqRead-MB/sec-1 129.66 ( 0.00%) 156.16 ( 20.44%) 129.72 ( 0.05%) Mean SeqRead-MB/sec-2 115.74 ( 0.00%) 138.50 ( 19.66%) 115.61 ( -0.11%) Mean SeqRead-MB/sec-4 110.21 ( 0.00%) 127.08 ( 15.31%) 110.15 ( -0.06%) Mean SeqRead-MB/sec-8 101.70 ( 0.00%) 108.47 ( 6.65%) 102.15 ( 0.44%) Mean SeqRead-MB/sec-16 86.45 ( 0.00%) 91.57 ( 5.92%) 86.63 ( 0.20%) 3.16-rc1 3.16-rc1 3.16-rc1 seqreadv4r8seqreadv4r8classzone User 272.45 277.17 272.23 System 197.89 186.30 193.73 Elapsed 4589.17 4356.23 4584.57 3.16-rc1 3.16-rc1 3.16-rc1 seqreadv4r8seqreadv4r8classzone Zone normal velocity 11358.492 17996.733 12695.547 Zone dma32 velocity 8213.852 0.000 6891.421 Please stop making multiple logical changes in a single patch/testing unit. This will make it easier to verify them, and hopefully make it also more obvious if individual changes are underdocumented. As it stands, it's hard to impossible to verify the implementation when the intentions are not fully documented. Performance results can only do so much. They are meant to corroborate the model, not replace it. And again, if you change the way zone fairness works, please always include the zone velocity numbers or allocation numbers to show that your throughput improvements don't just come from completely wrecking fairness - or in this case from disabling an entire zone. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html