On 01/10/21 01:02, Thomas Gleixner wrote:
Now the proposed change is creating exactly the same problem:
+ if (data.flags & KVM_CLOCK_REALTIME) {
+ u64 now_real_ns = ktime_get_real_ns();
+
+ /*
+ * Avoid stepping the kvmclock backwards.
+ */
+ if (now_real_ns > data.realtime)
+ data.clock += now_real_ns - data.realtime;
+ }
Indeed, though it's opt-in (you can always not pass KVM_CLOCK_REALTIME
and then the kernel will not muck with the value you gave it).
virt came along and created a hard to solve circular dependency
problem:
- If CLOCK_MONOTONIC stops for too long then NTP/PTP gets out of
sync, but everything else is happy.
- If CLOCK_MONOTONIC jumps too far forward, then all hell breaks
lose, but NTP/PTP is happy.
Yes, I agree that this sums it up.
For example QEMU (meaning: Marcelo :)) has gone for the former and
"hoping" that NTP/PTP sorts it out sooner or later. The clock in
nanoseconds is sent out to the destination and restored.
Google's userspace instead went for the latter. The reason is that
they've always started running on the destination before finishing the
memory copy[1], therefore it's much easier to bound the CLOCK_MONOTONIC
jump.
I do like very much the cooperative S2IDLE or even S3 way to handle the
brownout during live migration. However if your stopping time is
bounded, these patches are nice because, on current processors that have
TSC scaling, they make it possible to keep the illusion of the TSC
running. Of course that's a big "if"; however, you can always bound the
stopping time by aborting the restart on the destination machine once
you get close enough to the limit.
Paolo
[1] see https://dl.acm.org/doi/pdf/10.1145/3296975.3186415, figure 3