On Tue, 2022-03-29 at 09:02 -0700, Oliver Upton wrote: > > There's a need to sound the alarm for NTP regardless of whether > TOLERABLE_THRESHOLD is exceeded. David pointed out that the host > advancing the guest clocks (delta injection or TSC advancement) could > inject some error. Also, hardware has likely changed and the new parts > will have their own errors as well. I don't admit to pointing that out in that form because I don't accept the use of the term "advance". Clocks advance *themselves*. That's what clocks do. When we perform a live update or live migration we might *adjust* those clocks, calibrate, synchronise or restore them. But I'll eat my keyboard before using the term "advance" for that. Even if our adjustment is in a forward direction. Let's consider the case of a live update — where we stop scheduling the guest for a moment, kexec into the new kernel, then resume scheduling the guest. I assert strongly that from the guest point of view this is *no* different to any other brief period of not being scheduled. Yes, in practice we have a whole new kernel, a whole new KVM and set of kvm_vcpus, and we've *restored* the state. And we have restored the TSCs/clocks in those new kvm objects to precisely match what they were before. Note: *match* not advance. Before the kexec, there were a bunch of relationships between clocks, mostly based on the host TSC Tₕ (assuming the case where that's stable and reliable): • The kernel's idea of wallclock time was based on Tₕ, plus some offset and divided by some frequency. NTP tweaks those values over time but at any given instant there is a current value for them which is used to derive the wallblock time. • The kernel's idea of the guest kvmclock epoch (nanoseconds since the KVM started) was based on Tₕ and some other offset and hopefully the same frequency. • The TSC of each vCPU was based on Tₕ, some offset and a TSC scaling factor. After a live update, the host TSC Tₕ is just the same as it always was. Not the same *value* of course; that was never the case from one tick to the next anyway. It's the same, in that it continues to advance *itself* at a consistent frequency as real time progresses, which is what clocks do. In the new kernel we just want all those other derivative clocks to *also* be the same as before. That is, the offset and multipliers are the *same* value. We're not "advancing" those clocks. We're *preserving* them. For live migration it's slightly harder because we don't have a consistent host TSC to use as the basis. The best we can do is NTP- synchronised wallclock time between the two hosts. And thus I think we want *these* constants to be preserved across the migration: The KVM's kvmclock was <K> at a given wallclock time <W> The TSC of each vCPU#n was <Tₙ> at a given value of kvmclock <Kₙ> In *this* case we are running on different hardware and the reliance on the NTP wallclock time as the basis for preserving the guest clocks may have introduced an error, as well as the fact that the hardware has changed. So in this case we should indeed inform the guest that it should consider itself out of NTP sync and start over, in *addition* to making a best effort to preserve those clocks. But there is no scope for the word "advance" to be used anywhere there either.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature