On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: >>> > + if (!(mce->status & 0xef80) == BIT(7)) >>> >>> Can we get a define for this, or a comment explaining all the magic >>> that's happening on that one line? >> >> Yes - also like lkp pointed out, the check isn't correct at all. Let me >> figure out what really needs to be done, and I will resend with a better >> comment. > > Needs extra parentheses to make it right. Vishal, sorry I led you astray. > > if (!((mce->status & 0xef80) == BIT(7))) > > The magic is shown in table 15-9 of the Intel Software Developers Manual > (but perhaps not well explained there). > > mce->status in the above code is a value plucked from a machine check > bank status register. See figure 15-6 in the SDM. The important bits for this > are {15:0} which are the "MCA Error code". Table 15-9 shows how these > are grouped into types, where the type is defined by the most significant '1' > bit in the field (excluding bit 12 which is the Correction Report Filtering bit, > see section 15.9.2.1). > > So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy" > error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on. Ah, ok. > Maybe we should have defines in mce.h for them? It gets a bit more complicated > as all the above only applies to Intel branded X86 CPUs ... on AMD different > decoding rules apply. Yeah, this code is x86_64 generic so should call into helpers that do the right thing per cpu type.