On Fri, Jan 29, 2016 at 04:00:19PM -0800, Tony Luck wrote: > The Intel Software Developer Manual describes bit 24 in the MCG_CAP > MSR: > MCG_SER_P (software error recovery support present) flag, > bit 24 — Indicates (when set) that the processor supports > software error recovery > But only some models with this capability bit set will actually > generate recoverable machine checks. > > Check the model name and set a synthetic capability bit. Provide > a command line option to set this bit anyway in case the kernel > doesn't recognise the model name. > > Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx> > --- > Documentation/x86/x86_64/boot-options.txt | 4 ++++ > arch/x86/include/asm/cpufeature.h | 1 + > arch/x86/include/asm/mce.h | 1 + > arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++++ > 4 files changed, 17 insertions(+) > > diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt > index 68ed3114c363..8423c04ae7b3 100644 > --- a/Documentation/x86/x86_64/boot-options.txt > +++ b/Documentation/x86/x86_64/boot-options.txt > @@ -60,6 +60,10 @@ Machine check > threshold to 1. Enabling this may make memory predictive failure > analysis less effective if the bios sets thresholds for memory > errors since we will not see details for all errors. > + mce=recovery > + Tell the kernel that this system can generate recoverable > + machine checks (useful when the kernel doesn't recognize > + the cpuid x86_model_id[]) I'd say "Force-enable generation of recoverable MCEs." and not mention implementation details in the description text. > nomce (for compatibility with i386): same as mce=off > > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index 7ad8c9464297..06c6c2d2fea0 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -106,6 +106,7 @@ > #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ > #define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */ > #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ > +#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */ > > /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ > #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index 2ea4527e462f..18d2ba9c8e44 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -113,6 +113,7 @@ struct mca_config { > bool ignore_ce; > bool disabled; > bool ser; > + bool recovery; > bool bios_cmci_threshold; > u8 banks; > s8 bootlog; > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 905f3070f412..16a3d0e29f84 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -1696,6 +1696,15 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) > return; > } > > + /* > + * MCG_CAP.MCG_SER_P is necessary but not sufficient to know > + * whether this processor will actually generate recoverable > + * machine checks. Check to see if this is an E7 model Xeon. > + */ > + if (mca_cfg.recovery || (mca_cfg.ser && > + !strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24))) Eeww, a model string check :-( Lemme guess: those E7s can't be represented by a range of model/steppings, can they? Similar to AMD_MODEL_RANGE() thing in cpu/amd.c, for example. In any case, that chunk belongs in the Intel part of __mcheck_cpu_apply_quirks(). -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>