Re: [PATCH] hrtimer: Reset hrtimer cpu base proper on CPU hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 29, 2018 at 03:20:32PM +0100, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> 
> commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.
> 
> The hrtimer interrupt code contains a hang detection and mitigation
> mechanism, which prevents that a long delayed hrtimer interrupt causes a
> continous retriggering of interrupts which prevent the system from making
> progress. If a hang is detected then the timer hardware is programmed with
> a certain delay into the future and a flag is set in the hrtimer cpu base
> which prevents newly enqueued timers from reprogramming the timer hardware
> prior to the chosen delay. The subsequent hrtimer interrupt after the delay
> clears the flag and resumes normal operation.
> 
> If such a hang happens in the last hrtimer interrupt before a CPU is
> unplugged then the hang_detected flag is set and stays that way when the
> CPU is plugged in again. At that point the timer hardware is not armed and
> it cannot be armed because the hang_detected flag is still active, so
> nothing clears that flag. As a consequence the CPU does not receive hrtimer
> interrupts and no timers expire on that CPU which results in RCU stalls and
> other malfunctions.
> 
> Clear the flag along with some other less critical members of the hrtimer
> cpu base to ensure starting from a clean state when a CPU is plugged in.
> 
> Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
> root cause of that hard to reproduce heisenbug. Once understood it's
> trivial and certainly justifies a brown paperbag.
> 
> Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
> Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Sebastian Sewior <bigeasy@xxxxxxxxxxxxx>
> Cc: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
> [bigeasy: backport to v3.18, drop ->next_timer it was introduced later]

Thanks for the backport, now queued up.

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]