On Thu, May 30, 2019, Tony W Wang-oc wrote: > Hi Ashok, > I have two questions about this patch, could you help to check: > > 1, for broadcast #MC exceptions, this patch seems require #MC exception > errors > set MCG_STATUS_RIPV = 1. > But for Intel CPU, some #MC exception errors set MCG_STATUS_RIPV = 0 > (like "Recoverable-not-continuable SRAR Type" Errors), for these errors > the patch doesn't seem to work, is that okay? > > 2, for LMCE exceptions, this patch seems require #MC exception errors > set MCG_STATUS_RIPV = 0 to make sure LMCE be handled normally even > on offline CPU. > For LMCE errors set MCG_STAUS_RIPV = 1, the patch prevents offline CPU > handle these LMCE errors, is that okay? > More specifically, this patch seems require #MC exceptions meet the condition "MCG_STATUS_RIPV ^ MCG_STATUS_LMCES == 1"; But on a Xeon X5650 machine (SMP), "Data CACHE Level-2 Generic Error" does not meet this condition. I got below message from: https://www.centos.org/forums/viewtopic.php?p=292742 Hardware event. This is not a software error. MCE 0 CPU 4 BANK 6 TSC b7065eeaa18b0 TIME 1545643603 Mon Dec 24 10:26:43 2018 MCG status:MCIP MCi status: Uncorrected error Error enabled Processor context corrupt MCA: Data CACHE Level-2 Generic Error STATUS b200000080000106 MCGSTATUS 4 MCGCAP 1c09 APICID 4 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 > Thanks > Tony W Wang-oc