On Wed, 2011-12-28 at 16:10 +0100, Mike Galbraith wrote: > On Wed, 2011-12-28 at 14:32 +0100, Arjan van de Ven wrote: > > > > I think we need to just say no to this, and kill the nohz=off option > > entirely. > > > > Seriously, are people still running with ticks for any legitimate > > reasons? (and not just because they goofed their config file) > > Yup. Realtime loads sometimes need it. Even without contention > problems, entering/leaving nohz is a latency source. If every little > bit counts, you may have the choice of letting the electric meter spin > or not getting the job done at all. Patch making tick skew a boot option below, and hard numbers below that. Test setup: 60 isolated cores running a synchronized frame scheduler model for 1 hour, scheduling worker-bees at three frequencies. (The testcase is supposed to "good enough" simulate a real frame rate scheduler, and did pretty well at showing the cost of these particular collisions.) First set of numbers is without tick skew, and nohz enabled. Second set is tick skewed, nohz and rt push/pull turned off for the isolated core set. The tick skew alone is responsible for an order of magnitude of jitter improvement. I have hard numbers for nohz and cpupri_set() as well, but bottom line for me is that with nohz enabled, my 30us jitter budget is nearly doubled, so even with the tick skewed, nohz is just not a viable option ATM. From: Mike Galbraith <mgalbraith@xxxxxxx> clockevents: Reinstate the per cpu tick skew Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 Historically, Linux has tried to make the regular timer tick on the various CPUs not happen at the same time, to avoid contention on xtime_lock. Nowadays, with the tickless kernel, this contention no longer happens since time keeping and updating are done differently. In addition, this skew is actually hurting power consumption in a measurable way on many-core systems. End quote Contrary to the above, contention does still happen, and can be a problem for realtime loads whether nohz is active or not, so give the user the ability to decide whether power consumption or jitter is the more important consideration. Signed-off-by: Mike Galbraith <mgalbraith@xxxxxxx> Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> --- Documentation/kernel-parameters.txt | 3 +++ kernel/time/tick-sched.c | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+) --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2295,6 +2295,9 @@ bytes respectively. Such letter suffixes simeth= [IA-64] simscsi= + skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate + xtime_lock contention on larger systems. + slram= [HW,MTD] slub_debug[=options[,slabs]] [MM, SLUB] --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -759,6 +759,8 @@ static enum hrtimer_restart tick_sched_t return HRTIMER_RESTART; } +static int sched_skew_tick; + /** * tick_setup_sched_timer - setup the tick emulation timer */ @@ -777,6 +779,14 @@ void tick_setup_sched_timer(void) /* Get the next period (per cpu) */ hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update()); + /* Offset the tick to avert xtime_lock contention. */ + if (sched_skew_tick) { + u64 offset = ktime_to_ns(tick_period) >> 1; + do_div(offset, num_possible_cpus()); + offset *= smp_processor_id(); + hrtimer_add_expires_ns(&ts->sched_timer, offset); + } + for (;;) { hrtimer_forward(&ts->sched_timer, now, tick_period); hrtimer_start_expires(&ts->sched_timer, @@ -858,3 +868,12 @@ int tick_check_oneshot_change(int allow_ tick_nohz_switch_to_nohz(); return 0; } + +static int __init skew_tick(char *str) +{ + get_option(&str, &sched_skew_tick); + + return 0; +} +early_param("skew_tick", skew_tick); + No skewed tick, nohz active: FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23 FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43 FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 3456000 0.0159 51.51 (1751285) 1.0811 2.3215 0 (0) 940 (2496,2497,36625,36626,45649,..3438632) 5 3456000 0.0159 57.44 (1301949) 1.1164 2.3599 0 (0) 1010 (32353,32354,36625,36626,43681,..3434312) 6 3456000 0.0159 49.58 (546753) 1.0602 2.3222 0 (0) 1037 (32353,32354,36625,36626,41809,..3425240) 7 3456000 0.0159 52.20 (546753) 1.0681 2.3370 0 (0) 1035 (32353,32354,36625,36626,41809,..3432248) 8 3456000 0.0159 58.91 (1407504) 1.0592 2.0873 0 (0) 865 (11041,11042,15505,15506,25585,..3412208) 9 3456000 0.0159 54.61 (1407504) 1.0581 2.0775 0 (0) 850 (11041,11042,15505,15506,20234,..3411272) 10 3456000 0.0159 52.91 (1338694) 1.1259 2.0825 0 (0) 799 (11041,11042,15505,15506,16465,..3400640) 11 3456000 0.0159 50.56 (2470554) 1.1881 2.0364 0 (0) 334 (50714,113715,113716,166349,178780,..3421185) 12 3456000 0.0159 50.29 (2462200) 0.9961 2.0202 0 (0) 639 (9337,9338,11041,11042,15505,..3452529) 13 3456000 0.0159 56.52 (2470554) 1.1478 2.0602 0 (0) 400 (2545,2546,9121,9122,66434,..3440289) 14 3456000 0.0159 55.06 (34587) 1.2129 2.4890 0 (0) 444 (34587,34588,62571,62572,62619,..3440434) 15 3456000 0.0159 46.48 (583883) 1.2891 2.1824 0 (0) 306 (91563,95739,95740,141197,155741,..3406785) 16 3456000 0.0159 103.70 (2828662)2.1077 4.0380 410 (2) 9435 (697,698,1105,1106,1153,..3455937) 17 3456000 0.0159 73.89 (2470553) 2.1598 3.7529 0 (0) 6180 (2473,2474,3985,3986,8569,..3438201) 18 3456000 0.0159 54.14 (1212190) 2.2391 3.7075 0 (0) 5485 (10274,10275,13970,13971,14379,..3455794) 19 3456000 0.0159 99.20 (810712) 2.3861 4.5793 0 (0) 19845 (674,675,2259,2260,3554,..3455915) 20 3456000 0.0159 71.30 (631597) 2.2565 4.3141 0 (0) 9365 (674,675,3555,7394,7395,..3455914) 21 3456000 0.0159 71.51 (1431073) 2.3127 4.4810 0 (0) 25073 (1154,2259,2260,4011,4012,..3455963) 22 3456000 0.0159 62.45 (215262) 2.1318 4.3088 0 (0) 23570 (2259,2260,4011,4012,4539,..3455963) 23 3456000 0.0159 61.50 (212190) 2.1307 4.3165 0 (0) 23605 (2259,2260,4539,4540,5019,..3455963) 24 2397600 0.0587 145.26 (2229318)2.6808 6.2104 492 (14) 32977 (812,813,1145,1470,1471,..2397564) 25 2397600 0.0587 133.93 (250966) 2.6171 6.3300 492 (13) 35463 (812,813,1145,1146,1462,..2397564) 26 2397600 0.0587 140.25 (1405878)2.7079 6.1603 492 (12) 32428 (806,812,813,1145,1146,..2397564) 27 2397600 0.0587 141.56 (1405879)2.6893 6.1515 492 (14) 32089 (808,809,810,811,812,..2397564) 28 2397600 0.0587 146.57 (1405879)2.7129 6.0797 492 (14) 31637 (800,801,812,813,827,..2397564) 29 2397600 0.0587 137.99 (2172039)2.3360 5.9859 492 (14) 30551 (826,827,1157,1480,1481,..2397564) 30 2397600 0.0587 144.06 (948198) 2.2381 5.0413 496 (6) 19401 (826,827,832,833,1175,..2397566) 31 2397600 0.0587 141.92 (948198) 2.2509 5.0654 496 (4) 19353 (826,827,832,833,1175,..2397566) 32 2397600 0.0587 149.31 (2172038)2.7842 6.8891 492 (10) 41301 (822,823,824,825,826,..2397564) 33 2397600 0.0587 142.99 (1975198)2.6904 5.3538 181 (6) 21954 (511,512,846,847,1175,..2397582) 34 2397600 0.0587 167.07 (948199) 2.6350 5.6616 179 (4) 23602 (503,504,507,508,511,..2397582) 35 2397600 0.0587 79.81 (2152123) 2.5135 4.1781 0 (0) 5406 (1879,1881,1882,2876,2877,..2396956) 36 2397600 0.0587 112.24 (1184061)2.7419 5.3774 0 (0) 21005 (1185,1186,1189,1190,1518,..2397263) 37 2397600 0.0587 78.86 (986867) 2.6678 5.1954 0 (0) 19350 (529,530,861,863,1189,..2397263) 38 2397600 0.0587 77.90 (1782680) 2.5881 4.8399 0 (0) 13516 (525,526,529,530,860,..2396938) 39 2397600 0.0587 78.02 (1642135) 2.4351 3.8095 0 (0) 3569 (898,2900,2901,3561,3566,..2397291) 40 2397600 0.0587 218.81 (891116) 2.7215 6.6456 392 (8) 38961 (714,715,726,727,1046,..2397450) 41 2397600 0.0587 141.56 (1975198)2.6441 5.2995 181 (4) 22572 (846,847,1179,1180,1185,..2397249) 42 2397600 0.0587 77.07 (1782679) 2.3957 5.0119 0 (0) 17798 (529,530,860,861,862,..2397263) 43 2397600 0.0587 81.72 (1333323) 2.3469 4.5082 0 (0) 11172 (1205,1206,1207,1208,1865,..2396552) 44 1080000 0.0032 168.33 (988438) 2.7037 7.1729 381 (10) 20368 (650,651,662,663,809,..1056079) 45 1080000 0.0032 156.88 (935898) 2.6181 7.1047 0 (0) 19932 (767,768,809,810,866,..1022038) 46 1080000 0.0032 156.40 (935898) 2.2137 6.8080 0 (0) 18522 (684567,684568,695466,695467,699570,..975856) 47 1080000 0.0032 150.20 (905448) 2.6011 7.0525 0 (0) 19427 (2012,2013,510347,510348,617324,..980947) 48 1080000 0.0032 163.08 (1012102)3.0856 8.6857 491 (49) 32197 (527,528,536,537,545,..1059883) 49 1080000 0.0032 151.87 (861738) 2.1150 6.2499 0 (0) 14993 (679920,679921,681762,681763,684567,..889561) 50 1080000 0.0032 143.53 (843639) 2.3864 6.2304 0 (0) 14372 (673311,673312,676716,676717,679680,..907048) 51 1080000 0.0032 148.53 (815289) 2.4022 6.1284 0 (0) 13945 (667971,667972,672835,673311,673312,..925077) 52 1080000 0.0032 149.49 (815289) 2.4059 6.0745 0 (0) 13932 (667971,667972,672834,672835,673311,..925077) 53 1080000 0.0032 149.49 (788680) 2.2976 5.4171 0 (0) 10821 (662766,662767,664794,664795,667971,..851374) 54 1080000 0.0032 146.63 (788680) 2.1600 5.5494 0 (0) 11435 (662766,662767,664794,664795,667971,..925077) 55 1080000 0.0032 145.91 (817180) 2.3747 5.9131 0 (0) 13198 (664794,664795,667971,667972,672834,..925077) 56 1080000 0.0032 140.91 (788680) 2.4499 5.8216 0 (0) 13403 (641917,658567,662767,664794,664795,..925077) 57 1080000 0.0032 141.38 (707776) 1.2948 3.8831 0 (0) 5041 (654816,654817,658320,658321,658566,..757666) 58 1080000 0.0032 149.73 (707776) 1.2131 3.6946 0 (0) 4076 (641916,641917,654136,654816,654817,..739225) 59 1080000 0.0032 51.02 (220341) 1.3073 3.1542 0 (0) 1869 (138187,145140,145141,147822,147823,..1021026) 60 1080000 0.0032 119.93 (313205) 1.6518 5.2116 0 (0) 9504 (3019,3020,12955,12956,25645,..1078275) 61 1080000 0.0032 149.25 (707776) 1.2933 3.5546 0 (0) 3393 (631761,631762,641916,641917,647521,..732562) 62 1080000 0.0032 126.60 (222973) 2.0194 5.6079 0 (0) 11357 (3019,3020,12955,12956,14420,..1078275) 63 1080000 0.0032 126.60 (222973) 2.0223 5.6224 0 (0) 11452 (3019,3020,12955,12956,14420,..1078275) Same kernel, tick skew enabled, nohz and push/pull (100% pinned load...) disabled for the isolated cpuset. This is 10us or so better than 33-rt can do on this box with nohz=off, ie that's roughly the jitter that cpupri_set() induces (_can_ double that very rarely it seems). So with a couple little tweaks, 3.0-rt performs better than 33-rt (and can dynamically become "green" again when not running picky rt load) despite being a little fatter. 'Course if I applied the same dinky tweaks to 33-rt, the weight gain would show. Anyway, the numbers.. FREQ=960 FRAMES=3456000 LOOP=50000 using CPUs 4 - 23 FREQ=666 FRAMES=2397600 LOOP=72072 using CPUs 24 - 43 FREQ=300 FRAMES=1080000 LOOP=160000 using CPUs 44 - 63 on your marks... get set... POW! Cpu Frames Min Max(Frame) Avg Sigma LastTrans Fliers(Frames) 4 3456000 0.0159 5.98 (1957035) 0.1275 0.2979 0 (0) 5 3456000 0.0159 6.21 (2641598) 0.2173 0.3444 0 (0) 6 3456000 0.0159 5.26 (1313825) 0.1599 0.2956 0 (0) 7 3456000 0.0159 5.98 (346106) 0.1632 0.2877 0 (0) 8 3456000 0.0159 5.50 (70893) 0.1437 0.3450 0 (0) 9 3456000 0.0159 5.98 (1550901) 0.1381 0.3502 0 (0) 10 3456000 0.0159 5.74 (106100) 0.1478 0.3313 0 (0) 11 3456000 0.0159 5.71 (3174550) 0.1413 0.3090 0 (0) 12 3456000 0.0159 5.02 (1506694) 0.1761 0.3098 0 (0) 13 3456000 0.0159 5.71 (3054611) 0.1768 0.3546 0 (0) 14 3456000 0.0159 5.02 (3148871) 0.1299 0.3062 0 (0) 15 3456000 0.0159 4.99 (2122036) 0.1521 0.3132 0 (0) 16 3456000 0.0159 6.42 (1728959) 0.1521 0.3905 0 (0) 17 3456000 0.0159 6.21 (854434) 0.1618 0.3652 0 (0) 18 3456000 0.0159 6.93 (2190440) 0.1418 0.3548 0 (0) 19 3456000 0.0159 6.90 (1614252) 0.2075 0.4128 0 (0) 20 3456000 0.0159 5.47 (136316) 0.2002 0.3977 0 (0) 21 3456000 0.0159 6.69 (1057262) 0.1435 0.3475 0 (0) 22 3456000 0.0159 6.66 (3123382) 0.1602 0.3585 0 (0) 23 3456000 0.0159 5.94 (2297025) 0.2283 0.3616 0 (0) 24 2397600 0.0587 6.38 (991357) 0.2580 0.3817 0 (0) 25 2397600 0.0587 6.73 (1162518) 0.2380 0.3730 0 (0) 26 2397600 0.0587 7.21 (733474) 0.2502 0.3590 0 (0) 27 2397600 0.0587 6.86 (1873716) 0.2280 0.3768 0 (0) 28 2397600 0.0587 7.21 (2296767) 0.2521 0.3884 0 (0) 29 2397600 0.0587 7.21 (616888) 0.4165 0.4887 0 (0) 30 2397600 0.0587 7.09 (458995) 0.4245 0.4577 0 (0) 31 2397600 0.0587 6.14 (1674893) 0.3974 0.4544 0 (0) 32 2397600 0.0587 7.45 (130233) 0.4440 0.5456 0 (0) 33 2397600 0.0587 7.09 (1453350) 0.2482 0.3813 0 (0) 34 2397600 0.0587 6.73 (2365066) 0.2886 0.3827 0 (0) 35 2397600 0.0587 6.14 (35955) 0.2556 0.3841 0 (0) 36 2397600 0.0587 6.62 (2145554) 0.2566 0.3933 0 (0) 37 2397600 0.0587 7.81 (130234) 0.5375 0.5129 0 (0) 38 2397600 0.0587 7.33 (130234) 0.4921 0.5255 0 (0) 39 2397600 0.0587 7.57 (130234) 0.4200 0.4901 0 (0) 40 2397600 0.0587 6.62 (2367859) 0.2962 0.4553 0 (0) 41 2397600 0.0587 6.26 (206979) 0.5036 0.5491 0 (0) 42 2397600 0.0587 6.38 (1302660) 0.5093 0.5469 0 (0) 43 2397600 0.0587 6.73 (1825681) 0.5511 0.5734 0 (0) 44 1079999 0.0032 7.39 (91927) 0.4603 0.5291 0 (0) 45 1079999 0.0032 6.92 (977865) 0.3143 0.4378 0 (0) 46 1079999 0.0032 5.96 (1002473) 0.2129 0.3999 0 (0) 47 1079999 0.0032 6.44 (981423) 0.4193 0.5293 0 (0) 48 1079999 0.0032 6.20 (375165) 0.2602 0.4201 0 (0) 49 1079999 0.0032 5.73 (886536) 0.4002 0.5174 0 (0) 50 1079999 0.0032 6.44 (547629) 0.3182 0.4507 0 (0) 51 1079999 0.0032 5.73 (143994) 0.4736 0.5952 0 (0) 52 1079999 0.0032 6.68 (1053525) 0.4753 0.5132 0 (0) 53 1079999 0.0032 6.44 (378576) 0.3686 0.4691 0 (0) 54 1079999 0.0032 6.92 (886639) 0.6017 0.5538 0 (0) 55 1079999 0.0032 6.68 (1055655) 0.4917 0.5232 0 (0) 56 1079999 0.0032 6.44 (293526) 0.2752 0.4340 0 (0) 57 1079999 0.0032 8.59 (913209) 1.1433 0.8550 0 (0) 58 1079999 0.0032 5.25 (259824) 0.2139 0.3702 0 (0) 59 1079999 0.0032 6.68 (245211) 0.2031 0.3665 0 (0) 60 1079999 0.0032 6.44 (895440) 0.4445 0.4867 0 (0) 61 1079999 0.0032 5.96 (896382) 0.2541 0.3923 0 (0) 62 1079999 0.0032 7.16 (895440) 0.5437 0.5162 0 (0) 63 1079999 0.0032 6.44 (895371) 0.5707 0.5135 0 (0) So IMHO there is a valid case for keeping NO_HZ a config option for folks who can never tolerate the pricetag, but as for the nohz=off option, methinks that could indeed go away, given it's easy to make an on/off switch. I made one for both nohz and push/pull, just need to move it into cpusets and make it pretty enough to live. WRT $subject, it seems pretty clear that the RT kernel either wants tick skew back.. or collision avoidance radar.. or something. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html