On 14/12/2015 09:16, tip-bot for Andy Lutomirski wrote: > Commit-ID: 677a73a9aa5433ea728200c26a7b3506d5eaa92b > Gitweb: http://git.kernel.org/tip/677a73a9aa5433ea728200c26a7b3506d5eaa92b > Author: Andy Lutomirski <luto@xxxxxxxxxx> > AuthorDate: Thu, 10 Dec 2015 19:20:18 -0800 > Committer: Ingo Molnar <mingo@xxxxxxxxxx> > CommitDate: Fri, 11 Dec 2015 08:56:02 +0100 > > x86/kvm: On KVM re-enable (e.g. after suspend), update clocks > > This gets rid of the "did TSC go backwards" logic and just > updates all clocks. It should work better (no more disabling of > fast timing) and more reliably (all of the clocks are actually > updated). > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: Brian Gerst <brgerst@xxxxxxxxx> > Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx> > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: linux-mm@xxxxxxxxx > Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@xxxxxxxxxx > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 75 +++--------------------------------------------------- > 1 file changed, 3 insertions(+), 72 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 00462bd..6e32e87 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -123,8 +123,6 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); > unsigned int __read_mostly lapic_timer_advance_ns = 0; > module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); > > -static bool __read_mostly backwards_tsc_observed = false; > - > #define KVM_NR_SHARED_MSRS 16 > > struct kvm_shared_msrs_global { > @@ -1671,7 +1669,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) > &ka->master_cycle_now); > > ka->use_master_clock = host_tsc_clocksource && vcpus_matched > - && !backwards_tsc_observed > && !ka->boot_vcpu_runs_old_kvmclock; > > if (ka->use_master_clock) > @@ -7366,88 +7363,22 @@ int kvm_arch_hardware_enable(void) > struct kvm_vcpu *vcpu; > int i; > int ret; > - u64 local_tsc; > - u64 max_tsc = 0; > - bool stable, backwards_tsc = false; > > kvm_shared_msr_cpu_online(); > ret = kvm_x86_ops->hardware_enable(); > if (ret != 0) > return ret; > > - local_tsc = rdtsc(); > - stable = !check_tsc_unstable(); > list_for_each_entry(kvm, &vm_list, vm_list) { > kvm_for_each_vcpu(i, vcpu, kvm) { > - if (!stable && vcpu->cpu == smp_processor_id()) > + if (vcpu->cpu == smp_processor_id()) { > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > - if (stable && vcpu->arch.last_host_tsc > local_tsc) { > - backwards_tsc = true; > - if (vcpu->arch.last_host_tsc > max_tsc) > - max_tsc = vcpu->arch.last_host_tsc; > + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, > + vcpu); > } > } > } > > - /* > - * Sometimes, even reliable TSCs go backwards. This happens on > - * platforms that reset TSC during suspend or hibernate actions, but > - * maintain synchronization. We must compensate. Fortunately, we can > - * detect that condition here, which happens early in CPU bringup, > - * before any KVM threads can be running. Unfortunately, we can't > - * bring the TSCs fully up to date with real time, as we aren't yet far > - * enough into CPU bringup that we know how much real time has actually > - * elapsed; our helper function, get_kernel_ns() will be using boot > - * variables that haven't been updated yet. > - * > - * So we simply find the maximum observed TSC above, then record the > - * adjustment to TSC in each VCPU. When the VCPU later gets loaded, > - * the adjustment will be applied. Note that we accumulate > - * adjustments, in case multiple suspend cycles happen before some VCPU > - * gets a chance to run again. In the event that no KVM threads get a > - * chance to run, we will miss the entire elapsed period, as we'll have > - * reset last_host_tsc, so VCPUs will not have the TSC adjusted and may > - * loose cycle time. This isn't too big a deal, since the loss will be > - * uniform across all VCPUs (not to mention the scenario is extremely > - * unlikely). It is possible that a second hibernate recovery happens > - * much faster than a first, causing the observed TSC here to be > - * smaller; this would require additional padding adjustment, which is > - * why we set last_host_tsc to the local tsc observed here. > - * > - * N.B. - this code below runs only on platforms with reliable TSC, > - * as that is the only way backwards_tsc is set above. Also note > - * that this runs for ALL vcpus, which is not a bug; all VCPUs should > - * have the same delta_cyc adjustment applied if backwards_tsc > - * is detected. Note further, this adjustment is only done once, > - * as we reset last_host_tsc on all VCPUs to stop this from being > - * called multiple times (one for each physical CPU bringup). > - * > - * Platforms with unreliable TSCs don't have to deal with this, they > - * will be compensated by the logic in vcpu_load, which sets the TSC to > - * catchup mode. This will catchup all VCPUs to real time, but cannot > - * guarantee that they stay in perfect synchronization. > - */ > - if (backwards_tsc) { > - u64 delta_cyc = max_tsc - local_tsc; > - backwards_tsc_observed = true; > - list_for_each_entry(kvm, &vm_list, vm_list) { > - kvm_for_each_vcpu(i, vcpu, kvm) { > - vcpu->arch.tsc_offset_adjustment += delta_cyc; > - vcpu->arch.last_host_tsc = local_tsc; > - kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); > - } > - > - /* > - * We have to disable TSC offset matching.. if you were > - * booting a VM while issuing an S4 host suspend.... > - * you may have some problem. Solving this issue is > - * left as an exercise to the reader. > - */ > - kvm->arch.last_tsc_nsec = 0; > - kvm->arch.last_tsc_write = 0; > - } > - > - } > return 0; > } > > NACK. Please remove this patch from the tip tree. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |