PREEMPT_RT systems defer most irq_work handling into the timer softirq. This softirq is only triggered when a timer is expired, which adds some delay to the irq_work handling. It's a price PREEMPT_RT systems are willing to pay in exchange for less IRQ noise. This works fine for the majority of systems, but there's a catch. What if no timer is ever armed after an irq_work is queued. This has been observed on nohz_full CPUs while running oslat. The lack of armed timers prevents a pending irq_work to run. Which in turn prevents the nohz code from fully stopping the tick. To avoid this situation introduce new logic in run_local_timers(). The timer softirq will be triggered when an irq_work is pending but no timers have been armed. This situation is only possible in PREEMPT_RT systems, so make the code conditional to it. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> --- NOTE: All in all, this is the best I could think of with my limited timers knowledge. A bigger hammer would be to unanimously trigger the softirq if irq_work_needs_cpu(). But I get the feeling this is something we want to avoid. kernel/time/timer.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 2d7d68296a3b..7611673cb172 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1771,6 +1771,28 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); } +#ifdef CONFIG_PREEMPT_RT +static inline bool irq_work_needs_softirq(struct timer_base *base) +{ + /* + * Neither bases have armed timers and an irq_work is pending. Since we + * can't predict whether a timer will be armed in the future, request + * the timer softirq to be triggered. + */ + if (!base->pending && + (IS_ENABLED(CONFIG_NO_HZ_COMMON) && !(base + 1)->pending) && + irq_work_needs_cpu()) + return true; + + return false; +} +#else +static inline bool irq_work_needs_softirq(struct timer_base *base) +{ + return false; +} +#endif + /* * Called by the local, per-CPU timer interrupt on SMP. */ @@ -1779,6 +1801,10 @@ static void run_local_timers(void) struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); hrtimer_run_queues(); + + if (irq_work_needs_softirq(base)) + goto raise; + /* Raise the softirq only if required. */ if (time_before(jiffies, base->next_expiry)) { if (!IS_ENABLED(CONFIG_NO_HZ_COMMON)) @@ -1788,6 +1814,8 @@ static void run_local_timers(void) if (time_before(jiffies, base->next_expiry)) return; } + +raise: raise_softirq(TIMER_SOFTIRQ); } -- 2.31.1