Avi Kivity wrote: > From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > > Unlike all of the other cpuid bits, the TSC deadline timer bit is set > unconditionally, regardless of what userspace wants. > > This is broken in several ways: > - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate > the TSC deadline timer feature, a guest that uses the feature will > break - live migration to older host kernels that don't support the > TSC deadline timer will cause the feature to be pulled from under > the guest's feet; breaking it > - guests that are broken wrt the feature will fail. > > Fix by not enabling the feature automatically; instead report it to > userspace. > Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot > guarantee > will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not > KVM_GET_SUPPORTED_CPUID. > > Fixes the Illumos guest kernel, which uses the TSC deadline timer > feature. > > [avi: add the KVM_CAP + documentation] > > Reported-by: Alexey Zaytsev <alexey.zaytsev@xxxxxxxxx> > Signed-off-by: Avi Kivity <avi@xxxxxxxxxx> > --- > > As we're running out of time and everyone's checking their socks > instead of > inboxes I've added the missing parts myself. Jan, if you > accidentally see > this, please review and add your signoff. > > Documentation/virtual/kvm/api.txt | 9 +++++++++ > arch/x86/kvm/cpuid.c | 16 ++++++---------- > arch/x86/kvm/x86.c | 3 +++ > include/linux/kvm.h | 1 + > 4 files changed, 19 insertions(+), 10 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 5b03eee..da1f8fd 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1100,6 +1100,15 @@ emulate them efficiently. The fields in each > entry are defined as follows: eax, ebx, ecx, edx: the values > returned by the cpuid instruction for this function/index > combination > > +The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always > returned +as false, since the feature depends on KVM_CREATE_IRQCHIP > for local APIC +support. Instead it is reported via > + > + ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER) > + > +if that returns true you use KVM_CREATE_IRQCHIP, or if emulate the > +feature in userspace, then you can enable the feature for > KVM_SET_CPUID2. + > 4.47 KVM_PPC_GET_PVINFO > > Capability: KVM_CAP_PPC_GET_PVINFO > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 230f713..89b02bf 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -27,7 +27,6 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) > { > struct kvm_cpuid_entry2 *best; > struct kvm_lapic *apic = vcpu->arch.apic; > - u32 timer_mode_mask; > > best = kvm_find_cpuid_entry(vcpu, 1, 0); > if (!best) > @@ -40,15 +39,12 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) > best->ecx |= bit(X86_FEATURE_OSXSAVE); > } > > - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && > - best->function == 0x1) { > - best->ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); > - timer_mode_mask = 3 << 17; > - } else > - timer_mode_mask = 1 << 17; > - > - if (apic) > - apic->lapic_timer.timer_mode_mask = timer_mode_mask; > + if (apic) { > + if (best->ecx & bit(X86_FEATURE_TSC_DEADLINE_TIMER)) > + apic->lapic_timer.timer_mode_mask = 3 << 17; > + else > + apic->lapic_timer.timer_mode_mask = 1 << 17; > + } > > kvm_pmu_cpuid_update(vcpu); > } > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index df23dff..1171def 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2089,6 +2089,9 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_TSC_CONTROL: > r = kvm_has_tsc_control; > break; > + case KVM_CAP_TSC_DEADLINE_TIMER: > + r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER); > + break; kvm tsc deadline timer is pure software emulated, not depend on host physically. Thanks, Jinsong > default: > r = 0; > break; > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index c3892fc..68e67e5 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -557,6 +557,7 @@ struct kvm_ppc_pvinfo { > #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ > #define KVM_CAP_PPC_PAPR 68 > #define KVM_CAP_S390_GMAP 71 > +#define KVM_CAP_TSC_DEADLINE_TIMER 72 > > #ifdef KVM_CAP_IRQ_ROUTING -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html