On 22/05/2024 01:17, David Woodhouse wrote:
From: David Woodhouse <dwmw@xxxxxxxxxxxx> When in 'master clock mode' (i.e. when host and guest TSCs are behaving sanely and in sync), the KVM clock is defined in terms of the guest TSC. When TSC scaling is used, calculating the KVM clock directly from *host* TSC cycles leads to a systemic drift from the values calculated by the guest from its TSC. Commit 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers") had a simple workaround for the specific case of Xen timers, as it had an actual vCPU to hand and could use its scaling information. That commit noted that it was broken for the general case of get_kvmclock_ns(), and said "I'll come back to that". Since __get_kvmclock() is invoked without a specific CPU, it needs to be able to find or generate the scaling values required to perform the correct calculation. Thankfully, TSC scaling can only happen with X86_FEATURE_CONSTANT_TSC, so it isn't as complex as it might have been. In __kvm_synchronize_tsc(), note the current vCPU's scaling ratio in kvm->arch.last_tsc_scaling_ratio. That is only protected by the tsc_write_lock, so in pvclock_update_vm_gtod_copy(), copy it into a separate kvm->arch.master_tsc_scaling_ratio so that it can be accessed using the kvm->arch.pvclock_sc seqcount lock. Also generate the mul and shift factors to convert to nanoseconds for the corresponding KVM clock, just as kvm_guest_time_update() would. In __get_kvmclock(), which runs within a seqcount retry loop, use those values to convert host to guest TSC and then to nanoseconds. Only fall back to using get_kvmclock_base_ns() when not in master clock mode. There was previously a code path in __get_kvmclock() which looked like it could set KVM_CLOCK_TSC_STABLE without KVM_CLOCK_REALTIME, perhaps even on 32-bit hosts. In practice that could never happen as the ka->use_master_clock flag couldn't be set on 32-bit, and even on 64-bit hosts it would never be set when the system clock isn't TSC-based. So that code path is now removed. The kvm_get_wall_clock_epoch() function had the same problem; make it just call get_kvmclock() and subtract kvmclock from wallclock, with the same fallback as before. Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> --- arch/x86/include/asm/kvm_host.h | 4 + arch/x86/kvm/x86.c | 151 ++++++++++++++++---------------- 2 files changed, 79 insertions(+), 76 deletions(-)
Reviewed-by: Paul Durrant <paul@xxxxxxx>