This is a note to let you know that I've just added the patch titled tick: Detect and fix jiffies update stall to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: tick-detect-and-fix-jiffies-update-stall.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From stable-owner@xxxxxxxxxxxxxxx Sun Aug 13 05:16:30 2023 From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx> Date: Sun, 13 Aug 2023 03:16:18 +0000 Subject: tick: Detect and fix jiffies update stall To: stable@xxxxxxxxxxxxxxx Cc: Guenter Roeck <linux@xxxxxxxxxxxx>, Steven Rostedt <rostedt@xxxxxxxxxxx>, Frederic Weisbecker <frederic@xxxxxxxxxx>, "Paul E . McKenney" <paulmck@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> Message-ID: <20230813031620.2218302-1-joel@xxxxxxxxxxxxxxxxx> From: Frederic Weisbecker <frederic@xxxxxxxxxx> [ Upstream commit a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd ] tick: Detect and fix jiffies update stall On some rare cases, the timekeeper CPU may be delaying its jiffies update duty for a while. Known causes include: * The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping updates while waiting for the target CPU to complete its stop_machine() callback. * The timekeeper vcpu has VMEXIT'ed for a long while due to some overload on the host. Detect and fix these situations with emergency timekeeping catchups. Original-patch-by: Paul E. McKenney <paulmck@xxxxxxxxxx> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- kernel/time/tick-sched.c | 17 +++++++++++++++++ kernel/time/tick-sched.h | 4 ++++ 2 files changed, 21 insertions(+) --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -180,6 +180,8 @@ static ktime_t tick_init_jiffy_update(vo return period; } +#define MAX_STALLED_JIFFIES 5 + static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now) { int cpu = smp_processor_id(); @@ -207,6 +209,21 @@ static void tick_sched_do_timer(struct t if (tick_do_timer_cpu == cpu) tick_do_update_jiffies64(now); + /* + * If jiffies update stalled for too long (timekeeper in stop_machine() + * or VMEXIT'ed for several msecs), force an update. + */ + if (ts->last_tick_jiffies != jiffies) { + ts->stalled_jiffies = 0; + ts->last_tick_jiffies = READ_ONCE(jiffies); + } else { + if (++ts->stalled_jiffies == MAX_STALLED_JIFFIES) { + tick_do_update_jiffies64(now); + ts->stalled_jiffies = 0; + ts->last_tick_jiffies = READ_ONCE(jiffies); + } + } + if (ts->inidle) ts->got_idle_tick = 1; } --- a/kernel/time/tick-sched.h +++ b/kernel/time/tick-sched.h @@ -49,6 +49,8 @@ enum tick_nohz_mode { * @timer_expires_base: Base time clock monotonic for @timer_expires * @next_timer: Expiry time of next expiring timer for debugging purpose only * @tick_dep_mask: Tick dependency mask - is set, if someone needs the tick + * @last_tick_jiffies: Value of jiffies seen on last tick + * @stalled_jiffies: Number of stalled jiffies detected across ticks */ struct tick_sched { struct hrtimer sched_timer; @@ -77,6 +79,8 @@ struct tick_sched { u64 next_timer; ktime_t idle_expires; atomic_t tick_dep_mask; + unsigned long last_tick_jiffies; + unsigned int stalled_jiffies; }; extern struct tick_sched *tick_get_tick_sched(int cpu); Patches currently in stable-queue which might be from stable-owner@xxxxxxxxxxxxxxx are queue-5.15/tick-detect-and-fix-jiffies-update-stall.patch queue-5.15/timers-nohz-last-resort-update-jiffies-on-nohz_full-irq-entry.patch queue-5.15/timers-nohz-switch-to-oneshot_stopped-in-the-low-res-handler-when-the-tick-is-stopped.patch queue-5.15/netfilter-nf_tables-report-use-refcount-overflow.patch