Extreme time jitter with suspend/resume cycles

Gabriel Beddingfield <gabe@xxxxxxxxxxxx> · Wed, 4 Oct 2017 09:11:13 -0700

TL;DR: the "delta_delta" hack[1 and 2] in kernel/time/timekeeping.c
and drivers/rtc/class.c undermines the NTP system. It's not
appropriate to use if sub-second precision is available. I've attached
a patch to resolve this... please let me know the ways you hate it.
:-)

Hello Kernel Timekeeping Maintainers,

We have been developing a device that has very a very aggressive power
policy, doing suspend/resume cycles a few times a minute ("echo mem >
/sys/power/state"). In doing so, we found that the system time
experiences a lot of jitter (compared to, say, an NTP server). It was
not uncommon for us to see time corrections of 1s to 4s on a regular
basis. This didn't happen when the device stayed awake, only when it
was allowed to do suspend/resume.

We found that the problem is an interaction between the NTP code and
what I call the "delta_delta hack." (see [1] and [2]) This code
allocates a static variable in a function that contains an offset from
the system time to the persistent/rtc clock. It uses that time to
fudge the suspend timestamp so that on resume the sleep time will be
compensated. It's kind of a statistical hack that assumes things will
average out. It seems to have two main assumptions:

  1. The persistent/rtc clock has only single-second precision
  2. The system does not frequently suspend/resume.
  3. If delta_delta is less than 2 seconds, these assumptions are "true"

Because the delta_delta hack is trying to maintain an offset from the
system time to the persistent/rtc clock, any minor NTP corrections
that have occurred since the last suspend will be discarded. However,
the NTP subsystem isn't notified that this is happening -- and so it
causes some level of instability in its PLL logic.

This problem affects any device that does "frequent" suspend/resume
cycles. I.e. any battery-powered "linux" device (like Android phones).

Many ARM systems provide a "persistent clock." Most of them are backed
by a 32kHz clock that gives good precision and makes the delta_delta
hack unnecessary. However, devices that only have single-second
precision for the persistent clock and/or are forced to use the RTC
(whose API only allows for single-second precision) -- they still need
this hack.

I've attached a patch that we developed in-house. I have a feeling you
won't like it... since it pushes the responsibility on whoever
configures the kernel. It also ignores the RTC problem (which will
still affect a lot of battery-powered devices).

Please let me know what you think -- and what the right approach for
solving this would be.

Thanks,
Gabe

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/time/timekeeping.c?h=v4.13.4#n1717
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/drivers/rtc/class.c?h=v4.13.4#n76
From c03ceced9a210b48f2552e7dafa9099ef2449370 Mon Sep 17 00:00:00 2001
From: "Gabriel M. Beddingfield" <beddingfield@xxxxxxxxxxxx>
Date: Wed, 4 Oct 2017 08:38:57 -0700
Subject: [PATCH] time: add CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION to disable
 suspend/resume hack

Signed-off-by: Gabriel M. Beddingfield <beddingfield@xxxxxxxxxxxx>
Signed-off-by: Guy <guy@xxxxxxxxxxxx>
---
 kernel/time/Kconfig       | 16 ++++++++++++++++
 kernel/time/timekeeping.c |  4 ++++
 2 files changed, 20 insertions(+)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index ac09bc29eb08..32d54086c96c 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -143,5 +143,21 @@ config HIGH_RES_TIMERS
 	  hardware is not capable then this option only increases
 	  the size of the kernel image.
 
+config PERSISTENT_CLOCK_IS_LOW_PRECISION
+	bool "The persistent clock has only single-second precision"
+	default y
+	help
+	  When enabled, then on suspend/resume the timekeeping code will
+	  try to maintain a constant offset between the system time and
+	  the persistent clock as a means of compensating for the coarse
+	  (+/- 1 second) sleep time calculation. However, this will discard
+	  any "small" NTP corrections that have happened since the last
+	  resume. However, if the system's persistent clock has better
+	  precision (e.g. because it's backed by a 32kHz clock), this is
+	  not necessary and introduces unneeded time jitter.
+
+	  If your persistent clock has only single-second precision, say Y.
+	  If your persistent clock has sub-second precision, say N.
+
 endmenu
 endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2cafb49aa65e..b2c7b443ef37 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1694,8 +1694,10 @@ int timekeeping_suspend(void)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned long flags;
+#ifdef CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION
 	struct timespec64		delta, delta_delta;
 	static struct timespec64	old_delta;
+#endif
 
 	read_persistent_clock64(&timekeeping_suspend_time);
 
@@ -1712,6 +1714,7 @@ int timekeeping_suspend(void)
 	timekeeping_forward_now(tk);
 	timekeeping_suspended = 1;
 
+#ifdef CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION
 	if (persistent_clock_exists) {
 		/*
 		 * To avoid drift caused by repeated suspend/resumes,
@@ -1733,6 +1736,7 @@ int timekeeping_suspend(void)
 				timespec64_add(timekeeping_suspend_time, delta_delta);
 		}
 	}
+#endif
 
 	timekeeping_update(tk, TK_MIRROR);
 	halt_fast_timekeeper(tk);
-- 
2.14.2.920.gcf0c67979c-goog