Re: [PATCH 6.1, 5.15, 5.10] clocksource: Skip watchdog check for large watchdog intervals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 13, 2024 at 04:23:41PM +0100, Thomas Gleixner wrote:
> From: Jiri Wiesner <jwiesner@xxxxxxx>
> 
> commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 upstream.
> 
> There have been reports of the watchdog marking clocksources unstable on
> machines with 8 NUMA nodes:
> 
>   clocksource: timekeeping watchdog on CPU373:
>   Marking clocksource 'tsc' as unstable because the skew is too large:
>   clocksource:   'hpet' wd_nsec: 14523447520
>   clocksource:   'tsc'  cs_nsec: 14524115132
> 
> The measured clocksource skew - the absolute difference between cs_nsec
> and wd_nsec - was 668 microseconds:
> 
>   cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
> 
> The kernel used 200 microseconds for the uncertainty_margin of both the
> clocksource and watchdog, resulting in a threshold of 400 microseconds (the
> md variable). Both the cs_nsec and the wd_nsec value indicate that the
> readout interval was circa 14.5 seconds.  The observed behaviour is that
> watchdog checks failed for large readout intervals on 8 NUMA node
> machines. This indicates that the size of the skew was directly proportinal
> to the length of the readout interval on those machines. The measured
> clocksource skew, 668 microseconds, was evaluated against a threshold (the
> md variable) that is suited for readout intervals of roughly
> WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.
> 
> The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew
> threshold") was to tighten the threshold for evaluating skew and set the
> lower bound for the uncertainty_margin of clocksources to twice
> WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource
> watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
> 125 microseconds to fit the limit of NTP, which is able to use a
> clocksource that suffers from up to 500 microseconds of skew per second.
> Both the TSC and the HPET use default uncertainty_margin. When the
> readout interval gets stretched the default uncertainty_margin is no
> longer a suitable lower bound for evaluating skew - it imposes a limit
> that is far stricter than the skew with which NTP can deal.
> 
> The root causes of the skew being directly proportinal to the length of
> the readout interval are:
> 
>   * the inaccuracy of the shift/mult pairs of clocksources and the watchdog
>   * the conversion to nanoseconds is imprecise for large readout intervals
> 
> Prevent this by skipping the current watchdog check if the readout
> interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
> interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
> (of the TSC and HPET) corresponds to a limit on clocksource skew of 250
> ppm (microseconds of skew per second).  To keep the limit imposed by NTP
> (500 microseconds of skew per second) for all possible readout intervals,
> the margins would have to be scaled so that the threshold value is
> proportional to the length of the actual readout interval.
> 
> As for why the readout interval may get stretched: Since the watchdog is
> executed in softirq context the expiration of the watchdog timer can get
> severely delayed on account of a ksoftirqd thread not getting to run in a
> timely manner. Surely, a system with such belated softirq execution is not
> working well and the scheduling issue should be looked into but the
> clocksource watchdog should be able to deal with it accordingly.
> 
> Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
> Suggested-by: Feng Tang <feng.tang@xxxxxxxxx>
> Signed-off-by: Jiri Wiesner <jwiesner@xxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Reviewed-by: Feng Tang <feng.tang@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Link: https://lore.kernel.org/r/20240122172350.GA740@incl
> ---
> 
> Backport to 6.1, 5.15, 5.10 because tglx has too much spare time

Hey, I'll take it, thanks!  Now queued up.

greg k-h




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux