After tasks enter or leave a runqueue (wakeup/block) SCHED full_nohz dependency is checked (via sched_update_tick_dependency()). In case tick can be stopped on a CPU (see sched_can_stop_tick() for details), SCHED dependency for such CPU is cleared. However, this new information is not used right away to actually stop the tick. In CONFIG_PREEMPT systems booted with threadirqs option, sched clock tick is serviced by an actual task (ksoftirqd corresponding to the CPU where tick timer fired). So, in case a CPU was running a single task, servicing the timer involves exiting full nozh mode. Problem at this point is that we might lose chances to enter back into full nozh mode, since info about ksoftirqd thread going back to sleep is not used (as mentioned above). Fix it by calling tick_nozh_full_update_tick(_cpu)() right after SCHED dependency is cleared, so that tick can be promptly stopped. Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxxx> --- Hi, I noticed what seems to be the problem described in the changelog while running sysjitter [1] on a PREEMPT system setup for isolation and full nozh, i.e. ... skew_tick=1 nohz=on nohz_full=4-35 rcu_nocbs=4-35 threadirqs (36 CPUs box). Starting sysjitter with something like the following perf stat -C 4-35 -e irq_vectors:local_timer_entry taskset \ --cpu-list 4-35 ./sysjitter/sysjitter --runtime 30 200 with vanilla kernel was returning ~5k local_timer_entry events, while that number goes down to ~100 or so with the proposed patch applied. The following trace snippet also highlight the problematic situation: ... ksoftirqd/19-125 [019] 170.700754: softirq_entry: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.700755: softirq_exit: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.700756: softirq_entry: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.700757: softirq_exit: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.700759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120] sysjitter-2459 [019] 170.701740: local_timer_entry: vector=236 sysjitter-2459 [019] 170.701742: softirq_raise: vec=1 [action=TIMER] sysjitter-2459 [019] 170.701743: softirq_raise: vec=7 [action=SCHED] sysjitter-2459 [019] 170.701744: local_timer_exit: vector=236 sysjitter-2459 [019] 170.701747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019 sysjitter-2459 [019] 170.701748: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.701749: irq_work_entry: vector=246 sysjitter-2459 [019] 170.701750: irq_work_exit: vector=246 sysjitter-2459 [019] 170.701751: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.701753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120] ksoftirqd/19-125 [019] 170.701754: softirq_entry: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.701756: softirq_exit: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.701756: softirq_entry: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.701758: softirq_exit: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.701759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120] sysjitter-2459 [019] 170.702740: local_timer_entry: vector=236 sysjitter-2459 [019] 170.702742: softirq_raise: vec=1 [action=TIMER] sysjitter-2459 [019] 170.702743: softirq_raise: vec=7 [action=SCHED] sysjitter-2459 [019] 170.702744: local_timer_exit: vector=236 sysjitter-2459 [019] 170.702747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019 sysjitter-2459 [019] 170.702748: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.702749: irq_work_entry: vector=246 sysjitter-2459 [019] 170.702750: irq_work_exit: vector=246 sysjitter-2459 [019] 170.702751: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.702753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120] ksoftirqd/19-125 [019] 170.702755: softirq_entry: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.702756: softirq_exit: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.702757: softirq_entry: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.702758: softirq_exit: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.702760: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120] sysjitter-2459 [019] 170.703740: local_timer_entry: vector=236 sysjitter-2459 [019] 170.703742: softirq_raise: vec=1 [action=TIMER] sysjitter-2459 [019] 170.703743: softirq_raise: vec=7 [action=SCHED] sysjitter-2459 [019] 170.703745: local_timer_exit: vector=236 sysjitter-2459 [019] 170.703747: sched_wakeup: ksoftirqd/19:125 [120] success=1 CPU:019 sysjitter-2459 [019] 170.703748: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.703749: irq_work_entry: vector=246 sysjitter-2459 [019] 170.703750: irq_work_exit: vector=246 sysjitter-2459 [019] 170.703751: tick_stop: success=0 dependency=SCHED sysjitter-2459 [019] 170.703753: sched_switch: sysjitter:2459 [120] R ==> ksoftirqd/19:125 [120] ksoftirqd/19-125 [019] 170.703755: softirq_entry: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.703756: softirq_exit: vec=1 [action=TIMER] ksoftirqd/19-125 [019] 170.703757: softirq_entry: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.703758: softirq_exit: vec=7 [action=SCHED] ksoftirqd/19-125 [019] 170.703759: sched_switch: ksoftirqd/19:125 [120] S ==> sysjitter:2459 [120] sysjitter-2459 [019] 170.704500: call_function_single_entry: vector=251 sysjitter-2459 [019] 170.704502: call_function_single_exit: vector=251 sysjitter-2459 [019] 170.704504: tick_stop: success=1 dependency=NONE ... Luckily sometimes it happens (like in the above) that an interrupt fires on a CPU while ksoftirqd is not executing, SCHED dep is seen as cleared and the CPU enters full nohz. I've however observed pathological cases where CPUs aren't able to enter back full nohz for a long time. Hope my understanding is correct. Please find below a naive attempt to fix the problem. I'm pretty sure I'm missing details, e.g. I'm not sure it is safe to possibly stop the tick remotely. Consider the following as not more than another way of pointing the problem out. :-) Best, Juri 1 - https://github.com/alexeiz/sysjitter --- kernel/time/tick-sched.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3e2dc9b8858c..0d696e55d774 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -342,11 +342,16 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) } EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu); +static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu); + void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu); atomic_andnot(BIT(bit), &ts->tick_dep_mask); + + if (!in_irq()) + tick_nohz_full_update_tick_cpu(ts, cpu); } EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu); @@ -870,11 +875,9 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now) tick_nohz_restart(ts, now); } -static void tick_nohz_full_update_tick(struct tick_sched *ts) +static void tick_nohz_full_update_tick_cpu(struct tick_sched *ts, int cpu) { #ifdef CONFIG_NO_HZ_FULL - int cpu = smp_processor_id(); - if (!tick_nohz_full_cpu(cpu)) return; @@ -888,6 +891,15 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts) #endif } +static void tick_nohz_full_update_tick(struct tick_sched *ts) +{ +#ifdef CONFIG_NO_HZ_FULL + int cpu = smp_processor_id(); + + tick_nohz_full_update_tick_cpu(ts, cpu); +#endif +} + static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) { /* -- 2.25.3