On Tue, Nov 24, 2015 at 02:36:20PM -0200, Eduardo Habkost wrote: > KVM_X86_SET_MCE does not call kvm_vcpu_ioctl_x86_setup_mce(). It > calls kvm_vcpu_ioctl_x86_set_mce(), which stores the > IA32_MCi_{STATUS,ADDR,MISC} register contents at > vcpu->arch.mce_banks. Ah, correct. I've mistakenly followed KVM_X86_SETUP_MCE and not KVM_X86_SET_MCE, sorry. Ok, so this makes more sense now - there's kvm_inject_mce_oldstyle() in qemu and kvm_arch_on_sigbus_vcpu() which is on the SIGBUS handler path actually does: if ((env->mcg_cap & MCG_SER_P) && addr && (code == BUS_MCEERR_AR || code == BUS_MCEERR_AO)) { ... I betcha that MCG_SER_P is set on every guest, even !Intel ones. I need to go stare more at that code. > I didn't check the QEMU MCE code to confirm that, but I assume it > is implemented there. In that case, MCG_SER_P in > KVM_MCE_CAP_SUPPORTED just indicates it can be implemented by > userspace, as long as it makes the appropriate KVM_X86_SET_MCE > (or maybe KVM_SET_MSRS?) calls. I think it is that kvm_arch_on_sigbus_vcpu()/kvm_arch_on_sigbus() which handles SIGBUS with BUS_MCEERR_AR/BUS_MCEERR_AO si_code. See mm/memory-failure.c:kill_proc() in the kernel where we do send those signals to processes. However, I still think the MCG_SER_P bit being set on !Intel is wrong even though the recovery action done by kvm_arch_on_sigbus_vcpu()/kvm_arch_on_sigbus() is correct. Why, you're asking. :-) Well, what happens above is that the qemu process gets the signal that there was an uncorrectable error detected in its memory and it is either required to do something: BUS_MCEERR_AR == Action Required or its action is optional: BUS_MCEERR_AO == Action Optional. The SER_P text in the SDM describes those two: "SRAO errors indicate that some data in the system is corrupt, but the data has not been consumed and the processor state is valid. SRAO errors provide the additional error information for system software to perform a recovery action. An SRAO error is indicated with UC=1, PCC=0, S=1, EN=1 and AR=0 in the IA32_MCi_STATUS register." and "Software recoverable action required (SRAR) - a UCR error that requires system software to take a recovery action on this processor before scheduling another stream of execution on this processor. SRAR errors indicate that the error was detected and raised at the point of the consumption in the execution flow. An SRAR error is indicated with UC=1, PCC=0, S=1, EN=1 and AR=1 in the IA32_MCi_STATUS register." And for that we don't need to look at SER_P in qemu - we only need to know what the error severity of the error is and then we go and handle accordingly. Because those two si_codes are purely software-defined. And the application which gets that SIGBUS type doesn't need to care about SER_P. For example, AMD has similar error severities and they can be injected into qemu too. And qemu can do the exact same recovery actions based on the severity without even looking at the SER_P bit. So here's the problem: * SER_P is set on all guests and it puzzles kernels running on !Intel guests. * Hardware error recovery actions can be done regardless of that bit. The only case where that bit makes sense is if the emulated hardware itself is generating accurate MCEs and then, as a result, wants to make generate accurate error signatures: SRAO: UC=1, PCC=0, S=1, EN=1 and AR=0 SRAR: UC=1, PCC=0, S=1, EN=1 and AR=1 Those bits should have these settings only when the emulated hw actually implements SER_P. Otherwise, you'd get those old crude MCEs which are either uncorrectable and generate an #MC or are correctable errors. But ok, let me go do some staring at the examples you sent me previously first. I might get a better idea after I sleep on it. :-) Thanks! -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html