On Fri, Feb 25, 2022 at 4:10 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 2/25/22 02:39, Sean Christopherson wrote: > > Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the > > host TSC is constant, in which case the actual TSC frequency will never > > change and thus capturing the "max" TSC during initialization is > > unnecessary, KVM can simply use tsc_khz during VM creation. > > > > On CPUs with constant TSC, but not a hardware-specified TSC frequency, > > snapshotting max_tsc_khz and using that to set a VM's default TSC > > frequency can lead to KVM thinking it needs to manually scale the guest's > > TSC if refining the TSC completes after KVM snapshots tsc_khz. The > > actual frequency never changes, only the kernel's calculation of what > > that frequency is changes. On systems without hardware TSC scaling, this > > either puts KVM into "always catchup" mode (extremely inefficient), or > > prevents creating VMs altogether. > > > > Ideally, KVM would not be able to race with TSC refinement, or would have > > a hook into tsc_refine_calibration_work() to get an alert when refinement > > is complete. Avoiding the race altogether isn't practical as refinement > > takes a relative eternity; it's deliberately put on a work queue outside > > of the normal boot sequence to avoid unnecessarily delaying boot. > > > > Adding a hook is doable, but somewhat gross due to KVM's ability to be > > built as a module. And if the TSC is constant, which is likely the case > > for every VMX/SVM-capable CPU produced in the last decade, the race can > > be hit if and only if userspace is able to create a VM before TSC > > refinement completes; refinement is slow, but not that slow. > > > > For now, punt on a proper fix, as not taking a snapshot can help some > > uses cases and not taking a snapshot is arguably correct irrespective of > > the race with refinement. > > > > Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx> > > Cc: Anton Romanov <romanton@xxxxxxxxxx> > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > Queued, but I'd rather have a subject that calls out that max_tsc_khz > needs a replacement at vCPU creation time. In fact, the real change > (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject > mentions only the change in kvm_timer_init(). > > What do you think of "KVM: x86: Use current rather than max TSC > frequency if it is constant"? > > Pao Ping. This said "queued" but I don't think this ever landed. What's the status of this? Paolo, does this need more work?