On Wed, Nov 13, 2019 at 09:23:30AM +0100, Paolo Bonzini wrote: > On 13/11/19 07:38, Jan Kiszka wrote: > > When reading MCE, error code 0150h, ie. SRAR, I was wondering if that > > couldn't simply be handled by the host. But I suppose the symptom of > > that erratum is not "just" regular recoverable MCE, rather > > sometimes/always an unrecoverable CPU state, despite the error code, right? > > The erratum documentation talks explicitly about hanging the system, but > it's not clear if it's just a result of the OS mishandling the MCE, or > something worse. So I don't know. :( Pawan, do you? As Dave mentioned in the other email its "something worse". Although this erratum results in a machine check with the same MCACOD signature as an SRAR error (0x150) the MCi_STATUS.PCC bit will be set to one. The Intel Software Developers manual says that PCC=1 errors are fatal and cannot be recovered. 15.10.4.1 Machine-Check Exception Handler for Error Recovery [1] [...] The PCC flag in each IA32_MCi_STATUS register indicates whether recovery from the error is possible for uncorrected errors (UC=1). If the PCC flag is set for enabled uncorrected errors (UC=1 and EN=1), recovery is not possible. Thanks, Pawan [1] https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html