Clean up the KVM clock mess somewhat so that it is either based on the guest TSC ("master clock" mode), or on the host CLOCK_MONOTONIC_RAW in cases where the TSC isn't usable. Eliminate the third variant where it was based directly on the *host* TSC, due to bugs in e.g. __get_kvmclock(). Kill off the last vestiges of the KVM clock being based on CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_RAW and thus being subject to NTP skew. Fix up migration support to allow the KVM clock to be saved/restored as an arithmetic function of the guest TSC, since that's what it actually is in the *common* case so it can be migrated precisely. Or at least to within ±1 ns which is good enough, as discussed in https://lore.kernel.org/kvm/c8dca08bf848e663f192de6705bf04aa3966e856.camel@xxxxxxxxxxxxx In v2 of this series, TSC synchronization is improved and simplified a bit too, and we allow masterclock mode to be used even when the guest TSCs are out of sync, as long as they're running at the same *rate*. The different *offset* shouldn't matter. And the kvm_get_time_scale() function annoyed me by being entirely opaque, so I studied it until my brain hurt and then added some comments. In v2 I also dropped the commits which were removing the periodic clock syncs. In v3 I put them back again but *only* for the non-masterclock mode, along with cleaning up some other gratuitous clock jumps while in masterclock mode. And Jack's patch to move the pvclock structure to uapi. I also fixed the bug pointed out by Chenyi Qiang, that I was failing to set vcpu->arch.this_tsc_{nsec,write} after removing the cur_tsc_* fields. I also included patches to fix advertised steal time going backwards, and to make the guest more resilient to it. Those may end up being split out and submitted under separate cover (with selftests). Still needs more comprehensive selftests. David Woodhouse (18): KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host KVM: x86: Fix KVM clock precision in __get_kvmclock() KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() KVM: x86: Simplify and comment kvm_get_time_scale() KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() KVM: x86: Improve synchronization in kvm_synchronize_tsc() KVM: x86: Kill cur_tsc_{nsec,offset,write} fields KVM: x86: Allow KVM master clock mode when TSCs are offset from each other KVM: x86: Factor out kvm_use_master_clock() KVM: x86: Avoid global clock update on setting KVM clock MSR KVM: x86: Avoid gratuitous global clock reload in kvm_arch_vcpu_load() KVM: x86: Avoid periodic KVM clock updates in master clock mode KVM: x86/xen: Prevent runstate times from becoming negative sched/cputime: Cope with steal time going backwards or negative Jack Allister (3): KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration UAPI: x86: Move pvclock-abi to UAPI for x86 platforms KVM: selftests: Add KVM/PV clock selftest to prove timer correction Documentation/virt/kvm/api.rst | 37 ++ Documentation/virt/kvm/devices/vcpu.rst | 115 +++- arch/x86/include/asm/kvm_host.h | 15 +- arch/x86/include/uapi/asm/kvm.h | 6 + arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 24 +- arch/x86/kvm/svm/svm.c | 3 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 716 +++++++++++++++------- arch/x86/kvm/xen.c | 22 +- include/uapi/linux/kvm.h | 3 + kernel/sched/cputime.c | 20 +- tools/testing/selftests/kvm/Makefile | 1 + tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++ 13 files changed, 884 insertions(+), 272 deletions(-)