Re: [PATCH v2] KVM: x86: Use current rather than snapshotted TSC frequency if it is constant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2022-04-21 at 00:56 +0000, Anton Romanov wrote:
> Don't snapshot tsc_khz into per-cpu cpu_tsc_khz if the host TSC is
> constant, in which case the actual TSC frequency will never change and thus
> capturing TSC during initialization is unnecessary, KVM can simply use
> tsc_khz.  This value is snapshotted from
> kvm_timer_init->kvmclock_cpu_online->tsc_khz_changed(NULL)
> 
> On CPUs with constant TSC, but not a hardware-specified TSC frequency,
> snapshotting cpu_tsc_khz and using that to set a VM's target TSC frequency
> can lead to VM to think its TSC frequency is not what it actually is if
> refining the TSC completes after KVM snapshots tsc_khz.  The actual
> frequency never changes, only the kernel's calculation of what that
> frequency is changes.
> 
> Ideally, KVM would not be able to race with TSC refinement, or would have
> a hook into tsc_refine_calibration_work() to get an alert when refinement
> is complete.  Avoiding the race altogether isn't practical as refinement
> takes a relative eternity; it's deliberately put on a work queue outside of
> the normal boot sequence to avoid unnecessarily delaying boot.
> 
> Adding a hook is doable, but somewhat gross due to KVM's ability to be
> built as a module.  And if the TSC is constant, which is likely the case
> for every VMX/SVM-capable CPU produced in the last decade, the race can be
> hit if and only if userspace is able to create a VM before TSC refinement
> completes; refinement is slow, but not that slow.
> 
> For now, punt on a proper fix, as not taking a snapshot can help some uses
> cases and not taking a snapshot is arguably correct irrespective of the
> race with refinement.
> 
> Signed-off-by: Anton Romanov <romanton@xxxxxxxxxx>
> ---
> v2:
>     fixed commit msg indentation
>     added WARN_ON_ONCE in kvm_hyperv_tsc_notifier
>     opened up condition in __get_kvmclock
>  arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 547ba00ef64f..1043cfd26576 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2907,6 +2907,19 @@ static void kvm_update_masterclock(struct kvm *kvm)
>  	kvm_end_pvclock_update(kvm);
>  }
>  
> +/*
> + * If kvm is built into kernel it is possible that tsc_khz saved into
> + * per-cpu cpu_tsc_khz was yet unrefined value. If CPU provides CONSTANT_TSC it
> + * doesn't make sense to snapshot it anyway so just return tsc_khz
> + */
> +static unsigned long get_cpu_tsc_khz(void)
> +{
> +	if (static_cpu_has(X86_FEATURE_CONSTANT_TSC))
> +		return tsc_khz;
> +	else
> +		return __this_cpu_read(cpu_tsc_khz);
> +}
> +
>  /* Called within read_seqcount_begin/retry for kvm->pvclock_sc.  */
>  static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
>  {
> @@ -2917,7 +2930,8 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
>  	get_cpu();
>  
>  	data->flags = 0;
> -	if (ka->use_master_clock && __this_cpu_read(cpu_tsc_khz)) {
> +	if (ka->use_master_clock &&
> +		(static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_khz)))
>  #ifdef CONFIG_X86_64
>  		struct timespec64 ts;
>  
> @@ -2931,7 +2945,7 @@ static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
>  		data->flags |= KVM_CLOCK_TSC_STABLE;
>  		hv_clock.tsc_timestamp = ka->master_cycle_now;
>  		hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
> -		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> +		kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL,
>  				   &hv_clock.tsc_shift,
>  				   &hv_clock.tsc_to_system_mul);
>  		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
> @@ -3049,7 +3063,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>  
>  	/* Keep irq disabled to prevent changes to the clock */
>  	local_irq_save(flags);
> -	tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz);
> +	tgt_tsc_khz = get_cpu_tsc_khz();
>  	if (unlikely(tgt_tsc_khz == 0)) {
>  		local_irq_restore(flags);
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> @@ -8646,9 +8660,12 @@ static void tsc_khz_changed(void *data)
>  	struct cpufreq_freqs *freq = data;
>  	unsigned long khz = 0;
>  
> +	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> +		return;
> +
>  	if (data)
>  		khz = freq->new;
> -	else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> +	else
>  		khz = cpufreq_quick_get(raw_smp_processor_id());
>  	if (!khz)
>  		khz = tsc_khz;
> @@ -8661,6 +8678,8 @@ static void kvm_hyperv_tsc_notifier(void)
>  	struct kvm *kvm;
>  	int cpu;
>  
> +	WARN_ON_ONCE(boot_cpu_has(X86_FEATURE_TSC_RELIABLE));
> +
>  	mutex_lock(&kvm_lock);
>  	list_for_each_entry(kvm, &vm_list, vm_list)
>  		kvm_make_mclock_inprogress_request(kvm);

Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>


A question to AMD engineers: Is that really true that AMD cpu doesn't report
TSC frequency somewhere (CPUID/msr)?

It really sucks that we still have to measure it.

Best regards,
	Maxim Levitsky





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux