On Thu, 2022-04-21 at 00:56 +0000, Anton Romanov wrote: > Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is > constant, in which case the actual TSC frequency will never change and thus > capturing TSC during initialization is unnecessary, KVM can simply use > tsc_khz. This value is snapshotted from > kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL) > > On CPUs with constant TSC, but not a hardware-specified TSC frequency, > snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency > can lead to VM to think its TSC frequency is not what it actually is if > refining the TSC completes after KVM snapshots tsc_khz. The actual > frequency never changes, only the kernel's calculation of what that > frequency is changes. > > Ideally, KVM would not be able to race with TSC refinement, or would have > a hook into tsc_refine_calibration_work() to get an alert when refinement > is complete. Avoiding the race altogether isn't practical as refinement > takes a relative eternity; it's deliberately put on a work queue outside of > the normal boot sequence to avoid unnecessarily delaying boot. > > Adding a hook is doable, but somewhat gross due to KVM's ability to be > built as a module. And if the TSC is constant, which is likely the case > for every VMX/SVM-capable CPU produced in the last decade, the race can be > hit if and only if userspace is able to create a VM before TSC refinement > completes; refinement is slow, but not that slow. > > For now, punt on a proper fix, as not taking a snapshot can help some uses > cases and not taking a snapshot is arguably correct irrespective of the > race with refinement. > > Signed-off-by: Anton Romanov <romanton@xxxxxxxxxx> > --- > v2: > fixed commit msg indentation > added WARN_ON_ONCE in kvm_hyperv_tsc_notifier > opened up condition in __get_kvmclock > arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 547ba00ef64f..1043cfd26576 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2907,6 +2907,19 @@ static void kvm_update_masterclock(struct kvm *kvm) > kvm_end_pvclock_update(kvm); > } > > +/* > + * If kvm is built into kernel it is possible that tsc_khz saved into > + * per-cpu cpu_tsc_khz was yet unrefined value. If CPU provides CONSTANT_TSC it > + * doesn't make sense to snapshot it anyway so just return tsc_khz > + */ > +static unsigned long get_cpu_tsc_khz(void) > +{ > + if (static_cpu_has(X86_FEATURE_CONSTANT_TSC)) > + return tsc_khz; > + else > + return __this_cpu_read(cpu_tsc_khz); > +} > + > /* Called within read_seqcount_begin/retry for kvm->pvclock_sc. */ > static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) > { > @@ -2917,7 +2930,8 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) > get_cpu(); > > data->flags = 0; > - if (ka->use_master_clock && __this_cpu_read(cpu_tsc_khz)) { > + if (ka->use_master_clock && > + (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_khz))) > #ifdef CONFIG_X86_64 > struct timespec64 ts; > > @@ -2931,7 +2945,7 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) > data->flags |= KVM_CLOCK_TSC_STABLE; > hv_clock.tsc_timestamp = ka->master_cycle_now; > hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; > - kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, > + kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL, > &hv_clock.tsc_shift, > &hv_clock.tsc_to_system_mul); > data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); > @@ -3049,7 +3063,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > > /* Keep irq disabled to prevent changes to the clock */ > local_irq_save(flags); > - tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz); > + tgt_tsc_khz = get_cpu_tsc_khz(); > if (unlikely(tgt_tsc_khz == 0)) { > local_irq_restore(flags); > kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); > @@ -8646,9 +8660,12 @@ static void tsc_khz_changed(void *data) > struct cpufreq_freqs *freq = data; > unsigned long khz = 0; > > + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) > + return; > + > if (data) > khz = freq->new; > - else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) > + else > khz = cpufreq_quick_get(raw_smp_processor_id()); > if (!khz) > khz = tsc_khz; > @@ -8661,6 +8678,8 @@ static void kvm_hyperv_tsc_notifier(void) > struct kvm *kvm; > int cpu; > > + WARN_ON_ONCE(boot_cpu_has(X86_FEATURE_TSC_RELIABLE)); > + > mutex_lock(&kvm_lock); > list_for_each_entry(kvm, &vm_list, vm_list) > kvm_make_mclock_inprogress_request(kvm); Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> A question to AMD engineers: Is that really true that AMD cpu doesn't report TSC frequency somewhere (CPUID/msr)? It really sucks that we still have to measure it. Best regards, Maxim Levitsky