TL;DR: the "delta_delta" hack[1 and 2] in kernel/time/timekeeping.c and drivers/rtc/class.c undermines the NTP system. It's not appropriate to use if sub-second precision is available. I've attached a patch to resolve this... please let me know the ways you hate it. :-) Hello Kernel Timekeeping Maintainers, We have been developing a device that has very a very aggressive power policy, doing suspend/resume cycles a few times a minute ("echo mem > /sys/power/state"). In doing so, we found that the system time experiences a lot of jitter (compared to, say, an NTP server). It was not uncommon for us to see time corrections of 1s to 4s on a regular basis. This didn't happen when the device stayed awake, only when it was allowed to do suspend/resume. We found that the problem is an interaction between the NTP code and what I call the "delta_delta hack." (see [1] and [2]) This code allocates a static variable in a function that contains an offset from the system time to the persistent/rtc clock. It uses that time to fudge the suspend timestamp so that on resume the sleep time will be compensated. It's kind of a statistical hack that assumes things will average out. It seems to have two main assumptions: 1. The persistent/rtc clock has only single-second precision 2. The system does not frequently suspend/resume. 3. If delta_delta is less than 2 seconds, these assumptions are "true" Because the delta_delta hack is trying to maintain an offset from the system time to the persistent/rtc clock, any minor NTP corrections that have occurred since the last suspend will be discarded. However, the NTP subsystem isn't notified that this is happening -- and so it causes some level of instability in its PLL logic. This problem affects any device that does "frequent" suspend/resume cycles. I.e. any battery-powered "linux" device (like Android phones). Many ARM systems provide a "persistent clock." Most of them are backed by a 32kHz clock that gives good precision and makes the delta_delta hack unnecessary. However, devices that only have single-second precision for the persistent clock and/or are forced to use the RTC (whose API only allows for single-second precision) -- they still need this hack. I've attached a patch that we developed in-house. I have a feeling you won't like it... since it pushes the responsibility on whoever configures the kernel. It also ignores the RTC problem (which will still affect a lot of battery-powered devices). Please let me know what you think -- and what the right approach for solving this would be. Thanks, Gabe [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/time/timekeeping.c?h=v4.13.4#n1717 [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/drivers/rtc/class.c?h=v4.13.4#n76
From c03ceced9a210b48f2552e7dafa9099ef2449370 Mon Sep 17 00:00:00 2001 From: "Gabriel M. Beddingfield" <beddingfield@xxxxxxxxxxxx> Date: Wed, 4 Oct 2017 08:38:57 -0700 Subject: [PATCH] time: add CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION to disable suspend/resume hack Signed-off-by: Gabriel M. Beddingfield <beddingfield@xxxxxxxxxxxx> Signed-off-by: Guy <guy@xxxxxxxxxxxx> --- kernel/time/Kconfig | 16 ++++++++++++++++ kernel/time/timekeeping.c | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index ac09bc29eb08..32d54086c96c 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -143,5 +143,21 @@ config HIGH_RES_TIMERS hardware is not capable then this option only increases the size of the kernel image. +config PERSISTENT_CLOCK_IS_LOW_PRECISION + bool "The persistent clock has only single-second precision" + default y + help + When enabled, then on suspend/resume the timekeeping code will + try to maintain a constant offset between the system time and + the persistent clock as a means of compensating for the coarse + (+/- 1 second) sleep time calculation. However, this will discard + any "small" NTP corrections that have happened since the last + resume. However, if the system's persistent clock has better + precision (e.g. because it's backed by a 32kHz clock), this is + not necessary and introduces unneeded time jitter. + + If your persistent clock has only single-second precision, say Y. + If your persistent clock has sub-second precision, say N. + endmenu endif diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 2cafb49aa65e..b2c7b443ef37 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1694,8 +1694,10 @@ int timekeeping_suspend(void) { struct timekeeper *tk = &tk_core.timekeeper; unsigned long flags; +#ifdef CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION struct timespec64 delta, delta_delta; static struct timespec64 old_delta; +#endif read_persistent_clock64(&timekeeping_suspend_time); @@ -1712,6 +1714,7 @@ int timekeeping_suspend(void) timekeeping_forward_now(tk); timekeeping_suspended = 1; +#ifdef CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION if (persistent_clock_exists) { /* * To avoid drift caused by repeated suspend/resumes, @@ -1733,6 +1736,7 @@ int timekeeping_suspend(void) timespec64_add(timekeeping_suspend_time, delta_delta); } } +#endif timekeeping_update(tk, TK_MIRROR); halt_fast_timekeeper(tk); -- 2.14.2.920.gcf0c67979c-goog