From: Like Xu <likexu@xxxxxxxxxxx> Add kvm->arch.user_changed_tsc to avoid synchronizing user changes to the TSC with the KVM-initiated change in kvm_arch_vcpu_postcreate() by conditioning this mess on userspace having written the TSC at least once already. Here lies UAPI baggage: user-initiated TSC write with a small delta (1 second) of virtual cycle time against real time is interpreted as an attempt to synchronize the CPU. In such a scenario, the vcpu's tsc_offset is not configured as expected, resulting in significant guest service response latency, which is observed in our production environment. Reported-by: Yong He <alexyonghe@xxxxxxxxxxx> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423 Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> Suggested-by: Oliver Upton <oliver.upton@xxxxxxxxx> Original-by: Oliver Upton <oliver.upton@xxxxxxxxx> Tested-by: Like Xu <likexu@xxxxxxxxxxx> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx> --- V2 -> V3 Changelog: - Use the kvm->arch.user_changed_tsc proposal; (Oliver & Paolo) V2: https://lore.kernel.org/kvm/20230724073516.45394-1-likexu@xxxxxxxxxxx/ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 23 ++++++++++++++++------- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3bc146dfd38d..e8d423ef1474 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1303,6 +1303,7 @@ struct kvm_arch { u64 cur_tsc_offset; u64 cur_tsc_generation; int nr_vcpus_matched_tsc; + bool user_changed_tsc; u32 default_tsc_khz; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 278dbd37dab2..eeaf4ad9174d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2713,7 +2713,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, kvm_track_tsc_matching(vcpu); } -static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) +static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data, bool user_initiated) { struct kvm *kvm = vcpu->kvm; u64 offset, ns, elapsed; @@ -2734,20 +2734,29 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) * kvm_clock stable after CPU hotplug */ synchronizing = true; - } else { + } else if (kvm->arch.user_changed_tsc) { u64 tsc_exp = kvm->arch.last_tsc_write + nsec_to_cycles(vcpu, elapsed); u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL; /* - * Special case: TSC write with a small delta (1 second) - * of virtual cycle time against real time is - * interpreted as an attempt to synchronize the CPU. + * Here lies UAPI baggage: user-initiated TSC write with + * a small delta (1 second) of virtual cycle time + * against real time is interpreted as an attempt to + * synchronize the CPU. + * + * Don't synchronize user changes to the TSC with the + * KVM-initiated change in kvm_arch_vcpu_postcreate() + * by conditioning this mess on userspace having + * written the TSC at least once already. */ synchronizing = data < tsc_exp + tsc_hz && data + tsc_hz > tsc_exp; } } + if (user_initiated) + kvm->arch.user_changed_tsc = true; + /* * For a reliable TSC, we can match TSC offsets, and for an unstable * TSC, we add elapsed time in this computation. We could let the @@ -3776,7 +3785,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; case MSR_IA32_TSC: if (msr_info->host_initiated) { - kvm_synchronize_tsc(vcpu, data); + kvm_synchronize_tsc(vcpu, data, true); } else { u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset; adjust_tsc_offset_guest(vcpu, adj); @@ -11950,7 +11959,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) if (mutex_lock_killable(&vcpu->mutex)) return; vcpu_load(vcpu); - kvm_synchronize_tsc(vcpu, 0); + kvm_synchronize_tsc(vcpu, 0, false); vcpu_put(vcpu); /* poll control enabled by default */ base-commit: 5a7591176c47cce363c1eed704241e5d1c42c5a6 -- 2.41.0