There's a bug in kvm/vmx.c: if the host enabled machine check (CR4.MCE==1), the value gets zeroed while the CPU is running in guest context. If a machine check event arrives while the CPU is in guest context and effective CR4.MCE is zero, the machine raises CATERR and crashes. We should make sure the host value of CR4.MCE is always active. There are read and write shadows for the guest to think it wrote its own value. For discussion: there's new complexity with CR4 shadowing (1e02ce4cccdcb9688386e5b8d2c9fa4660b45389). I measure CR4 reads at 24 cycles on haswell and 36 on sandybridge, which compares well with L2 miss costs. Is the shadowing worth the complexity? CR4 is also cached (with no real consistency mechanism) in the VMCS at the time of guest VCPU creation. If there is ever a change in CR4 value over time, or if CR4 is different on different CPUs in the system, all this logic gets broken. Thanks, Ben --- The host's decision to enable machine check exceptions should remain in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset and passed a slightly-modified 0 to the vmcs.guest_cr4 value. Tested: Inject machine check while a guest is spinning. Before the change, if guest CR4.MCE==0, then the machine check is escalated to Catastrophic Error (CATERR) and the machine dies. If guest CR4.MCE==1, then the machine check causes VMEXIT and is handled normally by host Linux. After the change, injecting a machine check causes normal Linux machine check handling. --- arch/x86/kvm/vmx.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a214104..44c8d24 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3456,8 +3456,16 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long hw_cr4 = cr4 | (to_vmx(vcpu)->rmode.vm86_active ? - KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); + /* + * Pass through host's Machine Check Enable value to hw_cr4, which + * is in force while we are in guest mode. Do not let guests control + * this bit, even if host CR4.MCE == 0. + */ + unsigned long hw_cr4 = + (read_cr4() & X86_CR4_MCE) | + (cr4 & ~X86_CR4_MCE) | + (to_vmx(vcpu)->rmode.vm86_active ? + KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); if (cr4 & X86_CR4_VMXE) { /* -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html