Re: [patch 2/4] nohz: Prevent erroneous tick stop invocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 27, 2017 at 09:58:08PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Dec 2017, Thomas Gleixner wrote:
> > Bah, no. We need to move that into the nohz logic somehow to prevent that
> > repetitive expiry yesterday reprogramming. Lemme think about it some more.
> 
> The patch below should be the proper cure.
> 
> Thanks,
> 
> 	tglx
> 
> 8<-------------------
> Subject: nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Fri, 22 Dec 2017 15:51:13 +0100
> 
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> 
> The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
> subsequently invokes tick_nohz_stop_sched_tick() are:
> 
>   if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
> 
> If need_resched() is not set, but a timer softirq is pending then this is
> an indication that the softirq code punted and delegated the execution to
> softirqd. need_resched() is not true because the current interrupted task
> takes precedence over softirqd.
> 
> Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
> timer interrupts because the timer wheel contains an expired timer, but
> softirqs are not yet executed. So it returns an immediate expiry request,
> which causes the timer to fire immediately again. Lather, rinse and
> repeat....
> 
> Prevent that by adding a check for a pending timer soft interrupt to the
> conditions in tick_nohz_stop_sched_tick() which avoid calling
> get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
> prevents a repetitive programming of an already expired timer.
> 
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Sebastian Siewior <bigeasy@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Paul McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
> 
> ---
>  kernel/time/tick-sched.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -650,6 +650,11 @@ static void tick_nohz_restart(struct tic
>  	ts->next_tick = 0;
>  }
>  
> +static inline bool local_timer_softirq_pending(void)
> +{
> +	return local_softirq_pending & TIMER_SOFTIRQ;
> +}
> +
>  static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  					 ktime_t now, int cpu)
>  {
> @@ -666,8 +671,8 @@ static ktime_t tick_nohz_stop_sched_tick
>  	} while (read_seqretry(&jiffies_lock, seq));
>  	ts->last_jiffies = basejiff;
>  
> -	if (rcu_needs_cpu(basemono, &next_rcu) ||
> -	    arch_needs_cpu() || irq_work_needs_cpu()) {
> +	if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
> +	    irq_work_needs_cpu() || local_timer_softirq_pending()) {

Much better. This may need a comment though because it's not immediately
obvious why we have this check while softirqs are processed just before
tick_irq_exit().

Thanks.

Acked-by: Frederic Weisbecker <frederic@xxxxxxxxxx>



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]