Since commits 668fffa3f838 ("kvm: better MWAIT emulation for guests”) and 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts”), KVM was modified to allow an admin to configure certain guests to execute MONITOR/MWAIT inside guest without being intercepted by host. This is useful in case admin wishes to allocate a dedicated logical processor for each vCPU thread. Thus, making it safe for guest to completely control the power-state of the logical processor. The ability to use this new KVM capability was introduced to QEMU by commits 6f131f13e68d ("kvm: support -overcommit cpu-pm=on|off”) and 2266d4431132 ("i386/cpu: make -cpu host support monitor/mwait”). However, exposing MONITOR/MWAIT to a Linux guest may cause it's intel_idle kernel module to execute c1e_promotion_disable() which will attempt to RDMSR/WRMSR from/to MSR_IA32_POWER_CTL to manipulate the "C1E Enable" bit. This behaviour was introduced by commit 32e9518005c8 ("intel_idle: export both C1 and C1E”). Becuase KVM doesn't emulate this MSR, running KVM with ignore_msrs=0 will cause the above guest behaviour to raise a #GP which will cause guest to kernel panic. Therefore, add support for nop emulation of MSR_IA32_POWER_CTL to avoid #GP in guest in this scenario. Future commits can optimise emulation further by reflecting guest MSR changes to host MSR to provide guest with the ability to fine-tune the dedicated logical processor power-state. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> --- arch/x86/kvm/vmx/vmx.c | 6 ++++++ arch/x86/kvm/vmx/vmx.h | 2 ++ arch/x86/kvm/x86.c | 1 + 3 files changed, 9 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2634ee8c9dc8..6246d782b746 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1696,6 +1696,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP); break; + case MSR_IA32_POWER_CTL: + msr_info->data = vmx->msr_ia32_power_ctl; + break; case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported() || (!msr_info->host_initiated && @@ -1826,6 +1829,9 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_SYSENTER_ESP: vmcs_writel(GUEST_SYSENTER_ESP, data); break; + case MSR_IA32_POWER_CTL: + vmx->msr_ia32_power_ctl = data; + break; case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported() || (!msr_info->host_initiated && diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 99328954c2fc..e9772850a2a1 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -259,6 +259,8 @@ struct vcpu_vmx { unsigned long host_debugctlmsr; + u64 msr_ia32_power_ctl; + /* * Only bits masked by msr_ia32_feature_control_valid_bits can be set in * msr_ia32_feature_control. FEATURE_CONTROL_LOCKED is always included diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 02c8e095a239..39ee4087f954 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1167,6 +1167,7 @@ static u32 emulated_msrs[] = { MSR_PLATFORM_INFO, MSR_MISC_FEATURES_ENABLES, MSR_AMD64_VIRT_SPEC_CTRL, + MSR_IA32_POWER_CTL, }; static unsigned num_emulated_msrs; -- 2.20.1