On Fri, Dec 13, 2013 at 02:38:32PM -0800, H. Peter Anvin wrote: > On 12/13/2013 01:16 PM, Linus Torvalds wrote: > > On Fri, Dec 13, 2013 at 12:01 PM, Mel Gorman <mgorman@xxxxxxx> wrote: > >> > >> ebizzy > >> 3.13.0-rc3 3.4.69 3.13.0-rc3 3.13.0-rc3 > >> thread vanilla vanilla altershift-v2r1 nowalk-v2r7 > >> Mean 1 7377.91 ( 0.00%) 6812.38 ( -7.67%) 7784.45 ( 5.51%) 7804.08 ( 5.78%) > >> Mean 2 8262.07 ( 0.00%) 8276.75 ( 0.18%) 9437.49 ( 14.23%) 9450.88 ( 14.39%) > >> Mean 3 7895.00 ( 0.00%) 8002.84 ( 1.37%) 8875.38 ( 12.42%) 8914.60 ( 12.91%) > >> Mean 4 7658.74 ( 0.00%) 7824.83 ( 2.17%) 8509.10 ( 11.10%) 8399.43 ( 9.67%) > >> Mean 5 7275.37 ( 0.00%) 7678.74 ( 5.54%) 8208.94 ( 12.83%) 8197.86 ( 12.68%) > >> Mean 6 6875.50 ( 0.00%) 7597.18 ( 10.50%) 7755.66 ( 12.80%) 7807.51 ( 13.56%) > >> Mean 7 6722.48 ( 0.00%) 7584.75 ( 12.83%) 7456.93 ( 10.93%) 7480.74 ( 11.28%) > >> Mean 8 6559.55 ( 0.00%) 7591.51 ( 15.73%) 6879.01 ( 4.87%) 6881.86 ( 4.91%) > > > > Hmm. Do you have any idea why 3.4.69 still seems to do better at > > higher thread counts? > > > > No complaints about this patch-series, just wondering.. > > > > It would be really great to get some performance numbers on something > other than ebizzy, though... > What do you suggest? I'd be interested in hearing what sort of tests originally motivated the series. I picked a few different tests to see what fell out. All of this was driven from mmtests so I can do a release and point to the config files used if anyone wants to try reproducing it. First was Alex's microbenchmark from https://lkml.org/lkml/2012/5/17/59 and ran it for a range of thread numbers, 320 iterations per thread with random number of entires to flush. Results are from two machines 4 core: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz 8 core: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Single socket in both cases, both ivybridge. Neither are high end but my budget does not cover having high-end machines in my local test grid which is bad but unavoidable. On a 4 core machine tlbflush 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 11.17 ( 0.00%) 10.52 ( 5.82%) 5.15 ( 53.93%) Mean 2 11.70 ( 0.00%) 10.77 ( 7.99%) 10.30 ( 11.94%) Mean 3 24.07 ( 0.00%) 22.42 ( 6.87%) 10.89 ( 54.74%) Mean 4 40.48 ( 0.00%) 39.72 ( 1.88%) 19.51 ( 51.81%) Range 1 7.00 ( 0.00%) 7.00 ( 0.00%) 5.00 ( 28.57%) Range 2 44.00 ( 0.00%) 20.00 ( 54.55%) 23.00 ( 47.73%) Range 3 13.00 ( 0.00%) 16.00 (-23.08%) 8.00 ( 38.46%) Range 4 26.00 ( 0.00%) 32.00 (-23.08%) 11.00 ( 57.69%) Stddev 1 1.49 ( 0.00%) 1.45 ( -2.83%) 0.52 (-65.22%) Stddev 2 3.51 ( 0.00%) 2.20 (-37.20%) 7.46 (112.74%) Stddev 3 1.84 ( 0.00%) 2.43 ( 32.46%) 1.34 (-26.96%) Stddev 4 3.44 ( 0.00%) 4.61 ( 34.14%) 1.51 (-56.13%) 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla User 197.37 181.76 99.69 System 161.92 161.54 126.49 Elapsed 2741.19 2793.41 2749.12 Showing small gains on that machine but the variations are high enough that we cannot be certain it's a real gain. The random number of entries selection is what makes this noisy but picking a single number would bias the test for the characteristics of a single machine. Note that 3.4 is still just a lot better. This was an 8-core machine tlbflush 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 7.98 ( 0.00%) 8.54 ( -7.01%) 5.16 ( 35.36%) Mean 2 7.82 ( 0.00%) 8.35 ( -6.84%) 5.81 ( 25.71%) Mean 3 6.59 ( 0.00%) 7.80 (-18.36%) 5.58 ( 15.37%) Mean 5 13.28 ( 0.00%) 12.85 ( 3.20%) 8.88 ( 33.15%) Mean 8 32.50 ( 0.00%) 32.52 ( -0.04%) 19.92 ( 38.71%) Range 1 7.00 ( 0.00%) 6.00 ( 14.29%) 3.00 ( 57.14%) Range 2 8.00 ( 0.00%) 7.00 ( 12.50%) 18.00 (-125.00%) Range 3 6.00 ( 0.00%) 7.00 (-16.67%) 7.00 (-16.67%) Range 5 11.00 ( 0.00%) 20.00 (-81.82%) 9.00 ( 18.18%) Range 8 35.00 ( 0.00%) 33.00 ( 5.71%) 8.00 ( 77.14%) Stddev 1 1.31 ( 0.00%) 1.52 ( 15.75%) 0.48 (-63.66%) Stddev 2 1.55 ( 0.00%) 1.52 ( -1.54%) 3.06 ( 98.14%) Stddev 3 1.27 ( 0.00%) 1.61 ( 26.07%) 1.53 ( 20.16%) Stddev 5 2.99 ( 0.00%) 2.63 (-11.97%) 2.56 (-14.38%) Stddev 8 8.29 ( 0.00%) 6.51 (-21.46%) 1.23 (-85.15%) 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla User 316.01 341.55 205.00 System 249.25 273.16 203.79 Elapsed 3382.56 4398.20 3682.31 This is showing a mix of gains and losses with higher CPU usage to boot. The figures are again within variations so difficult to be conclusive about it. The system CPU usage is higher The following is netperf running UDP_STREAM and TCP_STREAM on loopback on the 4-core machine netperf-udp 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Tput 64 179.14 ( 0.00%) 177.82 ( -0.74%) 207.16 ( 15.64%) Tput 128 354.67 ( 0.00%) 350.04 ( -1.31%) 416.47 ( 17.42%) Tput 256 712.01 ( 0.00%) 697.31 ( -2.06%) 828.11 ( 16.31%) Tput 1024 2770.59 ( 0.00%) 2717.55 ( -1.91%) 3229.38 ( 16.56%) Tput 2048 5328.83 ( 0.00%) 5255.81 ( -1.37%) 6183.69 ( 16.04%) Tput 3312 8249.24 ( 0.00%) 8170.62 ( -0.95%) 9491.63 ( 15.06%) Tput 4096 9865.98 ( 0.00%) 9760.41 ( -1.07%) 11348.02 ( 15.02%) Tput 8192 17263.69 ( 0.00%) 17261.15 ( -0.01%) 19917.01 ( 15.37%) Tput 16384 27274.61 ( 0.00%) 27283.01 ( 0.03%) 30785.56 ( 12.87%) netperf-tcp 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Tput 64 1612.82 ( 0.00%) 1622.31 ( 0.59%) 1584.68 ( -1.74%) Tput 128 3043.06 ( 0.00%) 3024.19 ( -0.62%) 2926.80 ( -3.82%) Tput 256 5755.06 ( 0.00%) 5747.26 ( -0.14%) 5328.57 ( -7.41%) Tput 1024 17662.03 ( 0.00%) 17778.94 ( 0.66%) 11963.09 (-32.27%) Tput 2048 25382.69 ( 0.00%) 25464.23 ( 0.32%) 15043.90 (-40.73%) Tput 3312 29990.79 ( 0.00%) 30135.56 ( 0.48%) 15731.78 (-47.54%) Tput 4096 31612.33 ( 0.00%) 31775.74 ( 0.52%) 17626.10 (-44.24%) Tput 8192 35366.99 ( 0.00%) 35425.15 ( 0.16%) 21060.61 (-40.45%) Tput 16384 38547.25 ( 0.00%) 38441.09 ( -0.28%) 27925.43 (-27.56%) Very marginal there. Something nuts happened with UDP and TCP processing between 3.4 and 3.13 but this particular series' impact is marginal 8 core machine netperf-udp 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Tput 64 328.25 ( 0.00%) 331.05 ( 0.85%) 383.97 ( 16.97%) Tput 128 664.31 ( 0.00%) 659.58 ( -0.71%) 762.59 ( 14.79%) Tput 256 1305.82 ( 0.00%) 1309.65 ( 0.29%) 1508.27 ( 15.50%) Tput 1024 5110.17 ( 0.00%) 5081.82 ( -0.55%) 5775.96 ( 13.03%) Tput 2048 9839.14 ( 0.00%) 10074.00 ( 2.39%) 11010.10 ( 11.90%) Tput 3312 14787.70 ( 0.00%) 14850.59 ( 0.43%) 16821.29 ( 13.75%) Tput 4096 17583.14 ( 0.00%) 17936.17 ( 2.01%) 20246.74 ( 15.15%) Tput 8192 30165.48 ( 0.00%) 30386.78 ( 0.73%) 31904.81 ( 5.77%) Tput 16384 48345.93 ( 0.00%) 48127.68 ( -0.45%) 48850.30 ( 1.04%) netperf-tcp 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Tput 64 3064.32 ( 0.00%) 3149.22 ( 2.77%) 2701.19 (-11.85%) Tput 128 5777.71 ( 0.00%) 5899.85 ( 2.11%) 4931.78 (-14.64%) Tput 256 10330.00 ( 0.00%) 10567.97 ( 2.30%) 8388.28 (-18.80%) Tput 1024 30744.90 ( 0.00%) 31084.37 ( 1.10%) 17496.95 (-43.09%) Tput 2048 43064.86 ( 0.00%) 42916.90 ( -0.34%) 22227.42 (-48.39%) Tput 3312 50473.85 ( 0.00%) 50388.37 ( -0.17%) 25154.14 (-50.16%) Tput 4096 53909.70 ( 0.00%) 53965.40 ( 0.10%) 27328.49 (-49.31%) Tput 8192 63303.83 ( 0.00%) 63152.88 ( -0.24%) 32078.71 (-49.33%) Tput 16384 68632.11 ( 0.00%) 68063.05 ( -0.83%) 39758.01 (-42.07%) Looks a bit more solid. I didn't post the figures but the elapsed times are also lower implying that netperf is using fewer iterations to measure results it is confident of Next is a kernel build benchmark. I'd be very surprised if it was hitting the relevant paths but I think people expect to see this benchmark so.... 4 core machine kernbench 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla User min 714.10 ( 0.00%) 714.51 ( -0.06%) 706.83 ( 1.02%) User mean 715.04 ( 0.00%) 714.75 ( 0.04%) 707.64 ( 1.04%) User stddev 0.67 ( 0.00%) 0.25 ( 62.98%) 0.69 ( -3.40%) User max 716.12 ( 0.00%) 715.22 ( 0.13%) 708.56 ( 1.06%) User range 2.02 ( 0.00%) 0.71 ( 64.85%) 1.73 ( 14.36%) System min 32.89 ( 0.00%) 32.50 ( 1.19%) 39.17 (-19.09%) System mean 33.25 ( 0.00%) 32.75 ( 1.53%) 39.51 (-18.82%) System stddev 0.25 ( 0.00%) 0.22 ( 14.73%) 0.28 (-11.29%) System max 33.60 ( 0.00%) 33.12 ( 1.43%) 39.83 (-18.54%) System range 0.71 ( 0.00%) 0.62 ( 12.68%) 0.66 ( 7.04%) Elapsed min 195.70 ( 0.00%) 195.88 ( -0.09%) 195.84 ( -0.07%) Elapsed mean 196.09 ( 0.00%) 195.97 ( 0.06%) 196.14 ( -0.03%) Elapsed stddev 0.25 ( 0.00%) 0.06 ( 74.74%) 0.16 ( 33.94%) Elapsed max 196.41 ( 0.00%) 196.07 ( 0.17%) 196.33 ( 0.04%) Elapsed range 0.71 ( 0.00%) 0.19 ( 73.24%) 0.49 ( 30.99%) CPU min 381.00 ( 0.00%) 381.00 ( 0.00%) 380.00 ( 0.26%) CPU mean 381.00 ( 0.00%) 381.00 ( 0.00%) 380.40 ( 0.16%) CPU stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.49 (-99.00%) CPU max 381.00 ( 0.00%) 381.00 ( 0.00%) 381.00 ( 0.00%) CPU range 0.00 ( 0.00%) 0.00 ( 0.00%) 1.00 (-99.00%) 8 core machine kernbench 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla User min 632.94 ( 0.00%) 632.71 ( 0.04%) 681.00 ( -7.59%) User mean 633.25 ( 0.00%) 633.41 ( -0.02%) 681.34 ( -7.59%) User stddev 0.24 ( 0.00%) 0.55 (-124.00%) 0.34 (-39.88%) User max 633.55 ( 0.00%) 634.14 ( -0.09%) 681.99 ( -7.65%) User range 0.61 ( 0.00%) 1.43 (-134.43%) 0.99 (-62.30%) System min 29.74 ( 0.00%) 29.76 ( -0.07%) 38.24 (-28.58%) System mean 30.12 ( 0.00%) 30.22 ( -0.32%) 38.55 (-27.99%) System stddev 0.22 ( 0.00%) 0.24 (-11.04%) 0.25 (-14.10%) System max 30.39 ( 0.00%) 30.48 ( -0.30%) 38.87 (-27.90%) System range 0.65 ( 0.00%) 0.72 (-10.77%) 0.63 ( 3.08%) Elapsed min 88.40 ( 0.00%) 88.47 ( -0.08%) 95.81 ( -8.38%) Elapsed mean 88.55 ( 0.00%) 88.72 ( -0.20%) 96.01 ( -8.43%) Elapsed stddev 0.10 ( 0.00%) 0.15 (-46.20%) 0.23 (-125.69%) Elapsed max 88.72 ( 0.00%) 88.88 ( -0.18%) 96.30 ( -8.54%) Elapsed range 0.32 ( 0.00%) 0.41 (-28.13%) 0.49 (-53.13%) CPU min 747.00 ( 0.00%) 746.00 ( 0.13%) 747.00 ( 0.00%) CPU mean 748.80 ( 0.00%) 747.60 ( 0.16%) 749.20 ( -0.05%) CPU stddev 0.98 ( 0.00%) 1.36 (-38.44%) 1.47 (-50.00%) CPU max 750.00 ( 0.00%) 750.00 ( 0.00%) 751.00 ( -0.13%) CPU range 3.00 ( 0.00%) 4.00 (-33.33%) 4.00 (-33.33%) Yup, nothing there worth getting excited about although slightly amusing to note that we've improved kernel build times since 3.4.69 if nothing else. We're all over the performance of that! This is a modified ebizzy benchmark to give a breakdown of per-thread performance. 4 core machine ebizzy total throughput (higher the better) 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 6366.88 ( 0.00%) 6741.00 ( 5.88%) 6658.32 ( 4.58%) Mean 2 6917.56 ( 0.00%) 7952.29 ( 14.96%) 8120.79 ( 17.39%) Mean 3 6231.78 ( 0.00%) 6846.08 ( 9.86%) 7174.98 ( 15.14%) Mean 4 5887.91 ( 0.00%) 6503.12 ( 10.45%) 6903.05 ( 17.24%) Mean 5 5680.77 ( 0.00%) 6185.83 ( 8.89%) 6549.15 ( 15.29%) Mean 6 5692.87 ( 0.00%) 6249.48 ( 9.78%) 6442.21 ( 13.16%) Mean 7 5846.76 ( 0.00%) 6344.94 ( 8.52%) 6279.13 ( 7.40%) Mean 8 5974.57 ( 0.00%) 6406.28 ( 7.23%) 6265.29 ( 4.87%) Range 1 174.00 ( 0.00%) 202.00 (-16.09%) 806.00 (-363.22%) Range 2 286.00 ( 0.00%) 979.00 (-242.31%) 1255.00 (-338.81%) Range 3 530.00 ( 0.00%) 583.00 (-10.00%) 626.00 (-18.11%) Range 4 592.00 ( 0.00%) 691.00 (-16.72%) 630.00 ( -6.42%) Range 5 567.00 ( 0.00%) 417.00 ( 26.46%) 584.00 ( -3.00%) Range 6 588.00 ( 0.00%) 353.00 ( 39.97%) 439.00 ( 25.34%) Range 7 477.00 ( 0.00%) 284.00 ( 40.46%) 343.00 ( 28.09%) Range 8 408.00 ( 0.00%) 182.00 ( 55.39%) 237.00 ( 41.91%) Stddev 1 31.59 ( 0.00%) 32.94 ( -4.27%) 154.26 (-388.34%) Stddev 2 56.95 ( 0.00%) 136.79 (-140.19%) 194.45 (-241.43%) Stddev 3 132.28 ( 0.00%) 101.02 ( 23.63%) 106.60 ( 19.41%) Stddev 4 140.93 ( 0.00%) 136.11 ( 3.42%) 138.26 ( 1.90%) Stddev 5 118.58 ( 0.00%) 86.74 ( 26.85%) 111.73 ( 5.77%) Stddev 6 109.64 ( 0.00%) 77.49 ( 29.32%) 95.52 ( 12.87%) Stddev 7 103.91 ( 0.00%) 51.44 ( 50.50%) 54.43 ( 47.62%) Stddev 8 67.79 ( 0.00%) 31.34 ( 53.76%) 53.08 ( 21.69%) 4 core machine ebizzy Thread spread (closer to 0, the more fair it is) 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.34 ( 0.00%) 0.30 ( 11.76%) 0.07 ( 79.41%) Mean 3 1.29 ( 0.00%) 0.92 ( 28.68%) 0.29 ( 77.52%) Mean 4 7.08 ( 0.00%) 42.38 (-498.59%) 0.22 ( 96.89%) Mean 5 193.54 ( 0.00%) 483.41 (-149.77%) 0.41 ( 99.79%) Mean 6 151.12 ( 0.00%) 198.22 (-31.17%) 0.42 ( 99.72%) Mean 7 115.38 ( 0.00%) 160.29 (-38.92%) 0.58 ( 99.50%) Mean 8 108.65 ( 0.00%) 138.96 (-27.90%) 0.44 ( 99.60%) Range 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Range 2 5.00 ( 0.00%) 6.00 (-20.00%) 2.00 ( 60.00%) Range 3 10.00 ( 0.00%) 17.00 (-70.00%) 9.00 ( 10.00%) Range 4 256.00 ( 0.00%) 1001.00 (-291.02%) 5.00 ( 98.05%) Range 5 456.00 ( 0.00%) 1226.00 (-168.86%) 6.00 ( 98.68%) Range 6 298.00 ( 0.00%) 294.00 ( 1.34%) 8.00 ( 97.32%) Range 7 192.00 ( 0.00%) 220.00 (-14.58%) 7.00 ( 96.35%) Range 8 171.00 ( 0.00%) 163.00 ( 4.68%) 8.00 ( 95.32%) Stddev 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Stddev 2 0.72 ( 0.00%) 0.85 ( 17.99%) 0.29 (-59.72%) Stddev 3 1.42 ( 0.00%) 1.90 ( 34.22%) 1.12 (-21.19%) Stddev 4 33.83 ( 0.00%) 127.26 (276.15%) 0.79 (-97.65%) Stddev 5 92.08 ( 0.00%) 225.01 (144.35%) 1.06 (-98.85%) Stddev 6 64.82 ( 0.00%) 69.43 ( 7.11%) 1.28 (-98.02%) Stddev 7 36.66 ( 0.00%) 49.19 ( 34.20%) 1.18 (-96.79%) Stddev 8 30.79 ( 0.00%) 36.23 ( 17.64%) 1.06 (-96.55%) Three things to note here. The spread goes to hell when there are more workload threads than cores. Second, the patch is actually making the spread and thread fairness worse. Third, the fact that there is spread at all is bad because 3.4.69 experienced no such problem 8 core machine ebizzy 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 7295.77 ( 0.00%) 7835.63 ( 7.40%) 6713.32 ( -7.98%) Mean 2 8252.58 ( 0.00%) 9554.63 ( 15.78%) 8334.43 ( 0.99%) Mean 3 8179.74 ( 0.00%) 9032.46 ( 10.42%) 8134.42 ( -0.55%) Mean 4 7862.45 ( 0.00%) 8688.01 ( 10.50%) 7966.27 ( 1.32%) Mean 5 7170.24 ( 0.00%) 8216.15 ( 14.59%) 7820.63 ( 9.07%) Mean 6 6835.10 ( 0.00%) 7866.95 ( 15.10%) 7773.30 ( 13.73%) Mean 7 6740.99 ( 0.00%) 7586.36 ( 12.54%) 7712.45 ( 14.41%) Mean 8 6494.01 ( 0.00%) 6849.82 ( 5.48%) 7705.62 ( 18.66%) Mean 12 6567.37 ( 0.00%) 6973.66 ( 6.19%) 7554.82 ( 15.04%) Mean 16 6630.26 ( 0.00%) 7042.52 ( 6.22%) 7331.04 ( 10.57%) Range 1 767.00 ( 0.00%) 194.00 ( 74.71%) 661.00 ( 13.82%) Range 2 178.00 ( 0.00%) 185.00 ( -3.93%) 592.00 (-232.58%) Range 3 175.00 ( 0.00%) 213.00 (-21.71%) 431.00 (-146.29%) Range 4 806.00 ( 0.00%) 924.00 (-14.64%) 542.00 ( 32.75%) Range 5 544.00 ( 0.00%) 438.00 ( 19.49%) 444.00 ( 18.38%) Range 6 399.00 ( 0.00%) 1111.00 (-178.45%) 528.00 (-32.33%) Range 7 629.00 ( 0.00%) 895.00 (-42.29%) 467.00 ( 25.76%) Range 8 400.00 ( 0.00%) 255.00 ( 36.25%) 435.00 ( -8.75%) Range 12 233.00 ( 0.00%) 108.00 ( 53.65%) 330.00 (-41.63%) Range 16 141.00 ( 0.00%) 134.00 ( 4.96%) 496.00 (-251.77%) Stddev 1 73.94 ( 0.00%) 52.33 ( 29.23%) 177.17 (-139.59%) Stddev 2 23.47 ( 0.00%) 42.08 (-79.24%) 88.91 (-278.74%) Stddev 3 36.48 ( 0.00%) 29.02 ( 20.45%) 101.07 (-177.05%) Stddev 4 158.37 ( 0.00%) 133.99 ( 15.40%) 130.52 ( 17.59%) Stddev 5 116.74 ( 0.00%) 76.76 ( 34.25%) 78.31 ( 32.92%) Stddev 6 66.34 ( 0.00%) 273.87 (-312.83%) 87.79 (-32.33%) Stddev 7 145.62 ( 0.00%) 174.99 (-20.16%) 90.52 ( 37.84%) Stddev 8 68.51 ( 0.00%) 47.58 ( 30.54%) 81.11 (-18.39%) Stddev 12 32.15 ( 0.00%) 20.18 ( 37.22%) 65.74 (-104.50%) Stddev 16 21.59 ( 0.00%) 20.29 ( 6.01%) 86.42 (-300.25%) Patch series shows the strongest performance gain here. Not surprising considering this was the machine and test that first motivated the series. 3.4.69 is still a lot better. ebizzy Thread spread 3.13.0-rc3 3.13.0-rc3 3.4.69 vanilla nowalk-v2r7 vanilla Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.40 ( 0.00%) 0.35 ( 12.50%) 0.13 ( 67.50%) Mean 3 23.73 ( 0.00%) 0.46 ( 98.06%) 0.26 ( 98.90%) Mean 4 12.79 ( 0.00%) 1.40 ( 89.05%) 0.67 ( 94.76%) Mean 5 13.08 ( 0.00%) 4.06 ( 68.96%) 0.36 ( 97.25%) Mean 6 23.21 ( 0.00%) 136.62 (-488.63%) 1.13 ( 95.13%) Mean 7 15.85 ( 0.00%) 203.46 (-1183.66%) 1.51 ( 90.47%) Mean 8 109.37 ( 0.00%) 47.75 ( 56.34%) 1.05 ( 99.04%) Mean 12 124.84 ( 0.00%) 120.55 ( 3.44%) 0.59 ( 99.53%) Mean 16 113.50 ( 0.00%) 109.60 ( 3.44%) 0.49 ( 99.57%) Range 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Range 2 3.00 ( 0.00%) 11.00 (-266.67%) 1.00 ( 66.67%) Range 3 80.00 ( 0.00%) 5.00 ( 93.75%) 1.00 ( 98.75%) Range 4 38.00 ( 0.00%) 5.00 ( 86.84%) 2.00 ( 94.74%) Range 5 37.00 ( 0.00%) 21.00 ( 43.24%) 1.00 ( 97.30%) Range 6 46.00 ( 0.00%) 927.00 (-1915.22%) 8.00 ( 82.61%) Range 7 28.00 ( 0.00%) 716.00 (-2457.14%) 36.00 (-28.57%) Range 8 325.00 ( 0.00%) 315.00 ( 3.08%) 26.00 ( 92.00%) Range 12 160.00 ( 0.00%) 151.00 ( 5.62%) 5.00 ( 96.88%) Range 16 108.00 ( 0.00%) 123.00 (-13.89%) 1.00 ( 99.07%) Stddev 1 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) Stddev 2 0.62 ( 0.00%) 1.18 ( 91.08%) 0.34 (-45.44%) Stddev 3 17.40 ( 0.00%) 0.81 (-95.37%) 0.44 (-97.48%) Stddev 4 8.52 ( 0.00%) 1.05 (-87.69%) 0.51 (-94.00%) Stddev 5 7.91 ( 0.00%) 3.94 (-50.20%) 0.48 (-93.93%) Stddev 6 7.11 ( 0.00%) 174.18 (2348.91%) 1.48 (-79.18%) Stddev 7 5.90 ( 0.00%) 139.48 (2263.45%) 4.12 (-30.24%) Stddev 8 80.95 ( 0.00%) 58.03 (-28.32%) 2.65 (-96.72%) Stddev 12 31.48 ( 0.00%) 33.78 ( 7.30%) 0.66 (-97.89%) Stddev 16 24.32 ( 0.00%) 26.22 ( 7.79%) 0.50 (-97.94%) Again, while overall performance is better, the spread of performance between threads is worse but the fact that there is spread at all is bad. So overall to me it looks like the series still stands. The clearest result was from ebizzy which is an adverse workload in this specific case because of the size of the TLBs involved. The performance of individual threads is a big concern but I can bisect for that separately and see what falls out. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>