Re: [patch 5/8] ntp: Make the RTC synchronization more reliable

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 7 Dec 2020 16:59:03 -0400



On Sun, Dec 06, 2020 at 10:46:18PM +0100, Thomas Gleixner wrote:
> Miroslav reported that the periodic RTC synchronization in the NTP code
> fails more often than not to hit the specified update window.
> 
> The reason is that the code uses delayed_work to schedule the update which
> needs to be in thread context as the underlying RTC might be connected via
> a slow bus, e.g. I2C. In the update function it verifies whether the
> current time is correct vs. the requirements of the underlying RTC.
> 
> But delayed_work is using the timer wheel for scheduling which is
> inaccurate by design. Depending on the distance to the expiry the wheel
> gets less granular to allow batching and to avoid the cascading of the
> original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading
> wheel") and the code for further details.
> 
> The code already deals with this by splitting the 660 seconds period into a
> long 659 seconds timer and then retrying with a smaller delta.
> 
> But looking at the actual granularities of the timer wheel (which depend on
> the HZ configuration) the 659 seconds timer ends up in an outer wheel level
> and is affected by a worst case granularity of:
> 
> HZ          Granularity
> 1000        32s
>  250        16s
>  100        40s
> 
> So the initial timer can be already off by max 12.5% which is not a big
> issue as the period of the sync is defined as ~11 minutes.
> 
> The fine grained second attempt schedules to the desired update point with
> a timer expiring less than a second from now. Depending on the actual delta
> and the HZ setting even the second attempt can end up in outer wheel levels
> which have a large enough granularity to make the correctness check fail.
> 
> As this is a fundamental property of the timer wheel there is no way to
> make this more accurate short of iterating in one jiffies steps towards the
> update point.
> 
> Switch it to an hrtimer instead which schedules the actual update work. The
> hrtimer will expire precisely (max 1 jiffie delay when high resolution
> timers are not available). The actual scheduling delay of the work is the
> same as before.
> 
> The update is triggered from do_adjtimex() which is a bit racy but not much
> more racy than it was before:
> 
>      if (ntp_synced())
>      	queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> 
> which is racy when the work is currently executed and has not managed to
> reschedule itself.
> 
> This becomes now:
> 
>      if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer))
>      	queue_work(system_power_efficient_wq, &sync_work, 0);
> 
> which is racy when the hrtimer has expired and the work is currently
> executed and has not yet managed to rearm the hrtimer.
> 
> Not a big problem as it just schedules work for nothing.
> 
> The new implementation has a safe guard in place to catch the case where
> the hrtimer is queued on entry to the work function and avoids an extra
> update attempt of the RTC that way.
> 
> Reported-by: Miroslav Lichvar <mlichvar@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: John Stultz <john.stultz@xxxxxxxxxx>
> Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
> Cc: Jason Gunthorpe <jgg@xxxxxxxx>
> ---
>  include/linux/timex.h      |    1 
>  kernel/time/ntp.c          |   90 ++++++++++++++++++++++++---------------------
>  kernel/time/ntp_internal.h |    7 +++
>  3 files changed, 55 insertions(+), 43 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>

Jason