[PATCH -rt] kernel/time: unbreak nohz in -rt

Luiz Capitulino <lcapitulino@xxxxxxxxxx> · Mon, 21 Mar 2016 15:12:38 -0400

nohz support (nohz-full and nohz-idle) is currently
broken in the RT kernel. Meaning that, the tick is
never de-activated even when a core is idle or when
nohz_full= is passed.

The reason for this is that get_next_timer_interrupt()
in the RT kernel *always* returns "basem + TICK_NSEC"
which translates to "there's a timer firing in the
next tick". This causes tick_nohz_stop_sched_tick()
to never deactivate the tick.

This patch is like tylenol, it doesn't fix the problem, it
just reliefs the symptons by making tick_nohz_stop_sched_tick()
succeed if: 1. a core doesn't have any legacy timers
pending and 2. there's no hrtimer firing in the next tick.

Also, note that this issue has another side effect: it
causes the ktimersoftd thread to always take 1%-2% of CPU
time on all cores, even if they are idle. As it turns out,
the tick handling code path unconditionally raises the
TIMER_SOFTIRQ line. This is an upstream kernel behavior.
I believe people are not noticing the CPU usage because
nohz-idle papers over this problem.

Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
---
 kernel/time/timer.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index fee8682..2bf49af 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1451,8 +1451,14 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
 	/*
 	 * On PREEMPT_RT we cannot sleep here. As a result we can't take
 	 * the base lock to check when the next timer is pending and so
-	 * we assume the next jiffy.
+	 * we assume the next jiffy if there are active timers.
 	 */
+	local_irq_disable();
+	if (!base->active_timers) {
+		local_irq_enable();
+		return cmp_next_hrtimer_event(basem, expires);
+	}
+	local_irq_enable();
 	return basem + TICK_NSEC;
 #endif
 	spin_lock(&base->lock);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html