Re: [PATCH] KVM: VMX: Nop emulation of MSR_IA32_POWER_CTL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/16/19 4:40 PM, Liran Alon wrote:
>> On 16 Apr 2019, at 18:21, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
>>> On 15 Apr 2019, at 21:17, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>>> On Mon, Apr 15, 2019 at 06:45:26PM +0300, Liran Alon wrote:
>>>
>>> Technically, I think this is a Qemu bug.  KVM reports all zeros for
>>> CPUID_MWAIT_LEAF when userspace queries KVM_GET_SUPPORTED_CPUID and
>>> KVM_GET_EMULATED_CPUID.  And I think that's correct/desired, supporting
>>> MONITOR/MWAIT sub-features should be a separate enabling patch set.
>>
>> At some point in time Jim added commit df9cb9cc5bcd ("kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation”)
>> which added the following paragraph to documentation:
>> "Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
>> expose cpuid features (e.g. MONITOR) which are not supported by kvm in
>> its default configuration. If userspace enables such capabilities, it
>> is responsible for modifying the results of this ioctl appropriately.”
>>
>> It’s indeed not clear what it means to “modify the results of this ioctl *appropriately*” right?
>> It can either mean you just expose in CPUID[EAX=1].ECX support for MONITOR/MWAIT
>> or that you also expose CPUID_MWAIT_LEAF (CPUID[EAX=5]).
>> Both regardless of the value returned from KVM_GET_SUPPORTED_CPUID ioctl.
>>
>> Having said that, I tend to agree with you.
>> Instead of emulating this MSR in KVM, I think it it preferred to change QEMU to expose MONITOR/MWAIT support in CPUID[EAX=1].ECX
>> but in CPUID[EAX=5] init everything as in host besides ECX[0] which will be set to 0 to report we don’t support extensions.
>> (We still want to support range of monitor line size, whether we can treat interrupts as break-events for MWAIT and the supported C substates).
>> I will create this patch for QEMU.
> 
> Actually on second thought, I will just remain with the KVM patch (that Paolo was nice enough to already queue).
> and not do this QEMU patch. This is because why will we want to prevent guest from specifying target C-State if he is exposed with MWAIT?
> I don’t see a reason we should prevent that. Do you?
> 
One reason it is a good idea to prevent the guest from entering deeper
C-states (e.g. deeper than C2) is to allow preemption timer to be used again
when guests are exposed with MWAIT (currently we can't do that).

Not sure if this would be towards not exposing MWAIT leaf (CPUID[EAX=5]) at all.
But at least it's one other reason I see about limiting MWAIT exposed
featureset. Userspace would zero out CPUID[EAX=5].EDX or mask the exposed host
EDX value to the first 2 C-states of host values. Even if we zero out
CPUID[EAX=5].EDX (and consequently not intel_idle does get used) guest can still
use mwait (but is limited to C1 IIUC).

See the KVM patch which handles that just below the scissors mark. But I am
waiting on a specific workload test (other than my own tests) before I formally
submit this (and by this I mean its counterpart on tests and qemu).

	Joao

------- >8 --------

>From 29d69cd1e6d0f4e301dda53a1cef1b2c68d4a2d7 Mon Sep 17 00:00:00 2001
From: Joao Martins <joao.m.martins@xxxxxxxxxx>
Date: Thu, Feb 21 12:48:07 2019 -0500
Subject: [PATCH] KVM: VMX: use preemption timer for C-states <= C2

According to section "25.5.1 VMX-Preemption Timer" in Volume 3C of Intel
SDM, the VMX preemption timer stops counting on C-states deeper than C2.

When mwait is exposed to the guest and MWAIT/MONITOR intercepts
are disabled, and only if the advertised MWAIT substates are
limited up to C2 (or zeroed out), we use preemption timer.

A guest which happens to ignore CPUID is going against spec,
so they can only shot themselves in the foot.

Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
---
 arch/x86/kvm/vmx/vmx.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h |  3 +++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3fe2020e3bc4..153957bdfa73 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -45,6 +45,7 @@
 #include <asm/mmu_context.h>
 #include <asm/mshyperv.h>
 #include <asm/spec-ctrl.h>
+#include <asm/mwait.h>
 #include <asm/virtext.h>
 #include <asm/vmx.h>

@@ -6962,6 +6963,16 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
 		vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
 }

+static u32 vmx_get_mwait_substates(struct kvm_vcpu *vcpu)
+{
+	u32 eax, ebx, ecx, mwait_substates;
+
+	eax = CPUID_MWAIT_LEAF;
+	if (kvm_cpuid(vcpu, &eax, &ebx, &ecx, &mwait_substates, false))
+		return mwait_substates;
+	return 0;
+}
+
 static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -6987,6 +6998,36 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
 			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
 		update_intel_pt_cfg(vcpu);
+
+	if (kvm_mwait_in_guest(vcpu->kvm))
+		to_vmx(vcpu)->mwait_substates = vmx_get_mwait_substates(vcpu);
+}
+
+static bool vmx_mwait_hv_timer_reliable(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Per Intel SDM:
+	 *
+	 * Bits 03-00: Number of C0* sub C-states supported using MWAIT
+	 * Bits 07-04: Number of C1* sub C-states supported using MWAIT
+	 * Bits 11-08: Number of C2* sub C-states supported using MWAIT
+	 * Bits 15-12: Number of C3* sub C-states supported using MWAIT
+	 * Bits 19-16: Number of C4* sub C-states supported using MWAIT
+	 * Bits 23-20: Number of C5* sub C-states supported using MWAIT
+	 * Bits 27-24: Number of C6* sub C-states supported using MWAIT
+	 * Bits 31-28: Number of C7* sub C-states supported using MWAIT
+	 *
+	 * NOTE:
+	 * The definition of C0 through C7 states for MWAIT extension are
+	 * processor-specific C-states, not ACPI C-states.
+	 *
+	 * The VMX preemption timer stops counting in C-states higher than C2.
+	 * Most processors define MWAIT 0x0/0x1 as C1/C1E. MWAIT 0x10
+	 * corresponds to C3 except that on Atom processors which is C2.
+	 * So deem the first two as safe and allow preemption timer to be used.
+	 */
+#define VMX_MWAIT_CSTATES_UNSAFE 0xffffff00
+	return !(to_vmx(vcpu)->mwait_substates & VMX_MWAIT_CSTATES_UNSAFE);
 }

 static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
@@ -7046,7 +7087,7 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64
guest_deadline_tsc)
 	struct vcpu_vmx *vmx;
 	u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;

-	if (kvm_mwait_in_guest(vcpu->kvm))
+	if (kvm_mwait_in_guest(vcpu->kvm) && !vmx_mwait_hv_timer_reliable(vcpu))
 		return -EOPNOTSUPP;

 	vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1e42f983e0f1..6024ae52f90c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -257,6 +257,9 @@ struct vcpu_vmx {

 	unsigned long host_debugctlmsr;

+	/* maximum mwait sub-states */
+	u32 mwait_substates;
+
 	u64 msr_ia32_power_ctl;

 	/*
-- 
2.11.0



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux