On Fri, 31 May 2019, Anna-Maria Gleixner wrote: [...] > I will think about the problem and your solution a little bit more and > give you feedback hopefully on monday. I'm sorry for the delay. But now I'm able to give you a detailed feedback: The general problem is, that your solution is customized to a single use-case: preventing softirq raise but only if there is _no_ timer pending. To reach this goal without using locks, overhead is added to the formerly optimized add/mod path of a timer. With your code the timer softirq is raised even when there is a pending timer which does not has to be expired right now. But there have been requests in the past for this use case already. I discussed with Thomas several approaches during the last week how to solve the unconditional softirq timer raise in a more general way without loosing the fast add/mod path of a timer. The approach which seems to be the best has a dependency on a timer code change from a push to pull model which is still under developement (see v2 patchset: http://lkml.kernel.org/r/20170418111102.490432548@xxxxxxxxxxxxx). The patchset v2 has several problems but we are working on a solution for those problems right now. When the timer pull model is in place the approach to solve the unconditional timer softirq raise could look like the following: ---8<--- The next_expiry value of timer_base struct is used to store the next expiry value even if timer_base is not idle. Therefore it is udpated after adding or modifying a timer and also at the end of timer softirq. In case timer softirq does not has to be raised, the timer_base->clk is incremented to prevent stale clocks. Checking whether timer softirq has to be raised cannot be done lockless. This code is not compile tested nor boot tested. --- kernel/time/timer.c | 60 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 24 deletions(-) --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -552,37 +552,32 @@ static void static void trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer) { - if (!is_timers_nohz_active()) - return; - - /* - * TODO: This wants some optimizing similar to the code below, but we - * will do that when we switch from push to pull for deferrable timers. - */ - if (timer->flags & TIMER_DEFERRABLE) { - if (tick_nohz_full_cpu(base->cpu)) - wake_up_nohz_cpu(base->cpu); - return; + if (is_timers_nohz_active()) { + /* + * TODO: This wants some optimizing similar to the code + * below, but we will do that when we switch from push to + * pull for deferrable timers. + */ + if (timer->flags & TIMER_DEFERRABLE) { + if (tick_nohz_full_cpu(base->cpu)) + wake_up_nohz_cpu(base->cpu); + return; + } } - /* - * We might have to IPI the remote CPU if the base is idle and the - * timer is not deferrable. If the other CPU is on the way to idle - * then it can't set base->is_idle as we hold the base lock: - */ - if (!base->is_idle) - return; - /* Check whether this is the new first expiring timer: */ if (time_after_eq(timer->expires, base->next_expiry)) return; + /* Update next expiry time */ + base->next_expiry = timer->expires; /* - * Set the next expiry time and kick the CPU so it can reevaluate the - * wheel: + * We might have to IPI the remote CPU if the base is idle and the + * timer is not deferrable. If the other CPU is on the way to idle + * then it can't set base->is_idle as we hold the base lock: */ - base->next_expiry = timer->expires; - wake_up_nohz_cpu(base->cpu); + if (is_timers_nohz_active() && base->is_idle) + wake_up_nohz_cpu(base->cpu); } static void @@ -1684,6 +1679,7 @@ static inline void __run_timers(struct t while (levels--) expire_timers(base, heads + levels); } + base->next_expiry = __next_timer_interrupt(base); base->running_timer = NULL; raw_spin_unlock_irq(&base->lock); } @@ -1716,8 +1712,24 @@ void run_local_timers(void) base++; if (time_before(jiffies, base->clk)) return; + base--; + } + + /* + * check for next expiry + * + * deferrable base is igonred here - it is only usable when + * switching from push to pull model for deferrable timers + */ + raw_spin_lock_irq(&base->lock); + if (base->clk == base->next_expiry) { + raw_spin_unlock_irq(&base->lock); + raise_softirq(TIMER_SOFTIRQ); + } else { + base->clk++; + raw_spin_unlock_irq(&base->lock); + return; } - raise_softirq(TIMER_SOFTIRQ); } /* ---8<--- Thanks, Anna-Maria