On Wed, 15 Sep 2010 13:27:43 +0100 Mel Gorman <mel@xxxxxxxxx> wrote: > This is v2 of a series to reduce some of the latencies seen in page reclaim > and to improve the efficiency a bit. epic changelog! > > ... > > The tests run were as follows > > kernbench > compile-based benchmark. Smoke test performance > > sysbench > OLTP read-only benchmark. Will be re-run in the future as read-write > > micro-mapped-file-stream > This is a micro-benchmark from Johannes Weiner that accesses a > large sparse-file through mmap(). It was configured to run in only > single-CPU mode but can be indicative of how well page reclaim > identifies suitable pages. > > stress-highalloc > Tries to allocate huge pages under heavy load. > > kernbench, iozone and sysbench did not report any performance regression > on any machine. sysbench did pressure the system lightly and there was reclaim > activity but there were no difference of major interest between the kernels. > > X86-64 micro-mapped-file-stream > > traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4 > pgalloc_dma 1639.00 ( 0.00%) 667.00 (-145.73%) 1167.00 ( -40.45%) 578.00 (-183.56%) > pgalloc_dma32 2842410.00 ( 0.00%) 2842626.00 ( 0.01%) 2843043.00 ( 0.02%) 2843014.00 ( 0.02%) > pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgsteal_dma 729.00 ( 0.00%) 85.00 (-757.65%) 609.00 ( -19.70%) 125.00 (-483.20%) > pgsteal_dma32 2338721.00 ( 0.00%) 2447354.00 ( 4.44%) 2429536.00 ( 3.74%) 2436772.00 ( 4.02%) > pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgscan_kswapd_dma 1469.00 ( 0.00%) 532.00 (-176.13%) 1078.00 ( -36.27%) 220.00 (-567.73%) > pgscan_kswapd_dma32 4597713.00 ( 0.00%) 4503597.00 ( -2.09%) 4295673.00 ( -7.03%) 3891686.00 ( -18.14%) > pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgscan_direct_dma 71.00 ( 0.00%) 134.00 ( 47.01%) 243.00 ( 70.78%) 352.00 ( 79.83%) > pgscan_direct_dma32 305820.00 ( 0.00%) 280204.00 ( -9.14%) 600518.00 ( 49.07%) 957485.00 ( 68.06%) > pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pageoutrun 16296.00 ( 0.00%) 21254.00 ( 23.33%) 18447.00 ( 11.66%) 20067.00 ( 18.79%) > allocstall 443.00 ( 0.00%) 273.00 ( -62.27%) 513.00 ( 13.65%) 1568.00 ( 71.75%) > > These are based on the raw figures taken from /proc/vmstat. It's a rough > measure of reclaim activity. Note that allocstall counts are higher because > we are entering direct reclaim more often as a result of not sleeping in > congestion. In itself, it's not necessarily a bad thing. It's easier to > get a view of what happened from the vmscan tracepoint report. > > FTrace Reclaim Statistics: vmscan > > traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4 > Direct reclaims 443 273 513 1568 > Direct reclaim pages scanned 305968 280402 600825 957933 > Direct reclaim pages reclaimed 43503 19005 30327 117191 > Direct reclaim write file async I/O 0 0 0 0 > Direct reclaim write anon async I/O 0 3 4 12 > Direct reclaim write file sync I/O 0 0 0 0 > Direct reclaim write anon sync I/O 0 0 0 0 > Wake kswapd requests 187649 132338 191695 267701 > Kswapd wakeups 3 1 4 1 > Kswapd pages scanned 4599269 4454162 4296815 3891906 > Kswapd pages reclaimed 2295947 2428434 2399818 2319706 > Kswapd reclaim write file async I/O 1 0 1 1 > Kswapd reclaim write anon async I/O 59 187 41 222 > Kswapd reclaim write file sync I/O 0 0 0 0 > Kswapd reclaim write anon sync I/O 0 0 0 0 > Time stalled direct reclaim (seconds) 4.34 2.52 6.63 2.96 > Time kswapd awake (seconds) 11.15 10.25 11.01 10.19 > > Total pages scanned 4905237 4734564 4897640 4849839 > Total pages reclaimed 2339450 2447439 2430145 2436897 > %age total pages scanned/reclaimed 47.69% 51.69% 49.62% 50.25% > %age total pages scanned/written 0.00% 0.00% 0.00% 0.00% > %age file pages scanned/written 0.00% 0.00% 0.00% 0.00% > Percentage Time Spent Direct Reclaim 29.23% 19.02% 38.48% 20.25% > Percentage Time kswapd Awake 78.58% 78.85% 76.83% 79.86% > > What is interesting here for nocongest in particular is that while direct > reclaim scans more pages, the overall number of pages scanned remains the same > and the ratio of pages scanned to pages reclaimed is more or less the same. In > other words, while we are sleeping less, reclaim is not doing more work and > as direct reclaim and kswapd is awake for less time, it would appear to be doing less work. Yes, I think the reclaimed/scanned ratio (what I call "reclaim efficiency") is a key metric. 50% is low! What's the testcase here? micro-mapped-file-stream? It's strange that the "total pages reclaimed" increased a little. Just a measurement glitch? > FTrace Reclaim Statistics: congestion_wait > Direct number congest waited 87 196 64 0 > Direct time congest waited 4604ms 4732ms 5420ms 0ms > Direct full congest waited 72 145 53 0 > Direct number conditional waited 0 0 324 1315 > Direct time conditional waited 0ms 0ms 0ms 0ms > Direct full conditional waited 0 0 0 0 > KSwapd number congest waited 20 10 15 7 > KSwapd time congest waited 1264ms 536ms 884ms 284ms > KSwapd full congest waited 10 4 6 2 > KSwapd number conditional waited 0 0 0 0 > KSwapd time conditional waited 0ms 0ms 0ms 0ms > KSwapd full conditional waited 0 0 0 0 > > The vanilla kernel spent 8 seconds asleep in direct reclaim and no time at > all asleep with the patches. > > MMTests Statistics: duration > User/Sys Time Running Test (seconds) 10.51 10.73 10.6 11.66 > Total Elapsed Time (seconds) 14.19 13.00 14.33 12.76 Is that user time plus system time? If so, why didn't user+sys equal elapsed in the we-never-slept-in-congestion-wait() case? Because the test's CPU got stolen by kswapd perhaps? > Overall, the tests completed faster. It is interesting to note that backing off further > when a zone is congested and not just a BDI was more efficient overall. > > PPC64 micro-mapped-file-stream > pgalloc_dma 3024660.00 ( 0.00%) 3027185.00 ( 0.08%) 3025845.00 ( 0.04%) 3026281.00 ( 0.05%) > pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgsteal_dma 2508073.00 ( 0.00%) 2565351.00 ( 2.23%) 2463577.00 ( -1.81%) 2532263.00 ( 0.96%) > pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgscan_kswapd_dma 4601307.00 ( 0.00%) 4128076.00 ( -11.46%) 3912317.00 ( -17.61%) 3377165.00 ( -36.25%) > pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pgscan_direct_dma 629825.00 ( 0.00%) 971622.00 ( 35.18%) 1063938.00 ( 40.80%) 1711935.00 ( 63.21%) > pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) > pageoutrun 27776.00 ( 0.00%) 20458.00 ( -35.77%) 18763.00 ( -48.04%) 18157.00 ( -52.98%) > allocstall 977.00 ( 0.00%) 2751.00 ( 64.49%) 2098.00 ( 53.43%) 5136.00 ( 80.98%) > > ... > > > X86-64 STRESS-HIGHALLOC > traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 > Pass 1 82.00 ( 0.00%) 84.00 ( 2.00%) 85.00 ( 3.00%) 85.00 ( 3.00%) > Pass 2 90.00 ( 0.00%) 87.00 (-3.00%) 88.00 (-2.00%) 89.00 (-1.00%) > At Rest 92.00 ( 0.00%) 90.00 (-2.00%) 90.00 (-2.00%) 91.00 (-1.00%) > > Success figures across the board are broadly similar. > > traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 > Direct reclaims 1045 944 886 887 > Direct reclaim pages scanned 135091 119604 109382 101019 > Direct reclaim pages reclaimed 88599 47535 47863 46671 > Direct reclaim write file async I/O 494 283 465 280 > Direct reclaim write anon async I/O 29357 13710 16656 13462 > Direct reclaim write file sync I/O 154 2 2 3 > Direct reclaim write anon sync I/O 14594 571 509 561 > Wake kswapd requests 7491 933 872 892 > Kswapd wakeups 814 778 731 780 > Kswapd pages scanned 7290822 15341158 11916436 13703442 > Kswapd pages reclaimed 3587336 3142496 3094392 3187151 > Kswapd reclaim write file async I/O 91975 32317 28022 29628 > Kswapd reclaim write anon async I/O 1992022 789307 829745 849769 > Kswapd reclaim write file sync I/O 0 0 0 0 > Kswapd reclaim write anon sync I/O 0 0 0 0 > Time stalled direct reclaim (seconds) 4588.93 2467.16 2495.41 2547.07 > Time kswapd awake (seconds) 2497.66 1020.16 1098.06 1176.82 > > Total pages scanned 7425913 15460762 12025818 13804461 > Total pages reclaimed 3675935 3190031 3142255 3233822 > %age total pages scanned/reclaimed 49.50% 20.63% 26.13% 23.43% > %age total pages scanned/written 28.66% 5.41% 7.28% 6.47% > %age file pages scanned/written 1.25% 0.21% 0.24% 0.22% > Percentage Time Spent Direct Reclaim 57.33% 42.15% 42.41% 42.99% > Percentage Time kswapd Awake 43.56% 27.87% 29.76% 31.25% > > Scanned/reclaimed ratios again look good with big improvements in > efficiency. The Scanned/written ratios also look much improved. With a > better scanned/written ration, there is an expectation that IO would be more > efficient and indeed, the time spent in direct reclaim is much reduced by > the full series and kswapd spends a little less time awake. Wait. The reclaim efficiency got *worse*, didn't it? To reclaim 3,xxx,xxx pages, the number of pages we had to scan went from 7,xxx,xxx up to 13,xxx,xxx? > > ... > > I think this series is ready for much wider testing. The lowlumpy patches in > particular should be relatively uncontroversial. While their largest impact > can be seen in the high order stress tests, they would also have an impact > if SLUB was configured (these tests are based on slab) and stalls in lumpy > reclaim could be partially responsible for some desktop stalling reports. slub sucks :( Is this patchset likely to have any impact on the "hey my net driver couldn't do an order 3 allocation" reports? I guess not. > > ... > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>