nohz support (nohz-full and nohz-idle) is currently broken in the RT kernel. Meaning that, the tick is never de-activated even when a core is idle or when nohz_full= is passed. The reason for this is that get_next_timer_interrupt() in the RT kernel *always* returns "basem + TICK_NSEC" which translates to "there's a timer firing in the next tick". This causes tick_nohz_stop_sched_tick() to never deactivate the tick. This patch is like tylenol, it doesn't fix the problem, it just reliefs the symptons by making tick_nohz_stop_sched_tick() succeed if: 1. a core doesn't have any legacy timers pending and 2. there's no hrtimer firing in the next tick. Also, note that this issue has another side effect: it causes the ktimersoftd thread to always take 1%-2% of CPU time on all cores, even if they are idle. As it turns out, the tick handling code path unconditionally raises the TIMER_SOFTIRQ line. This is an upstream kernel behavior. I believe people are not noticing the CPU usage because nohz-idle papers over this problem. Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx> --- kernel/time/timer.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index fee8682..2bf49af 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1451,8 +1451,14 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) /* * On PREEMPT_RT we cannot sleep here. As a result we can't take * the base lock to check when the next timer is pending and so - * we assume the next jiffy. + * we assume the next jiffy if there are active timers. */ + local_irq_disable(); + if (!base->active_timers) { + local_irq_enable(); + return cmp_next_hrtimer_event(basem, expires); + } + local_irq_enable(); return basem + TICK_NSEC; #endif spin_lock(&base->lock); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html