Re: [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum

"Huang, Kai" <kai.huang@xxxxxxxxx> · Mon, 19 Jun 2023 11:37:21 +0000

On Wed, 2023-06-07 at 22:43 +0000, Huang, Kai wrote:
> On Wed, 2023-06-07 at 07:15 -0700, Hansen, Dave wrote:
> > On 6/4/23 07:27, Kai Huang wrote:
> > > TDX memory has integrity and confidentiality protections.  Violations of
> > > this integrity protection are supposed to only affect TDX operations and
> > > are never supposed to affect the host kernel itself.  In other words,
> > > the host kernel should never, itself, see machine checks induced by the
> > > TDX integrity hardware.
> > 
> > At the risk of patting myself on the back by acking a changelog that I
> > wrote 95% of:
> > 
> > Reviewed-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > 
> 
> Thanks!

Hi Dave,

Thanks for reviewing and providing the tag.  However I found there's a bug if we
use early_initcall() to detect erratum here -- in the later kexec() patch, the
early_initcall(tdx_init) sets up the x86_platform.memory_shutdown() callback to
reset TDX private memory depending on presence of the erratum, but there's no
guarantee detecting erratum will be done before tdx_init() because they are both
early_initcall().

Kirill also said early_initcall() isn't the right place so I changed to do the
detection to earlier phase in bsp_init_intel(), because we just need to match
cpu once for BSP assuming CPU model is consistent across all cpus (which is the
assumption of x86_match_cpu() anyway).

Please let me know for any comments?

+/*
+ * These CPUs have an erratum.  A partial write from non-TD
+ * software (e.g. via MOVNTI variants or UC/WC mapping) to TDX
+ * private memory poisons that memory, and a subsequent read of
+ * that memory triggers #MC.
+ */
+static const struct x86_cpu_id tdx_pw_mce_cpu_ids[] __initconst = {
+       X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
+       X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, NULL),
+       { }
+};
+
 static void bsp_init_intel(struct cpuinfo_x86 *c)
 {
        resctrl_cpu_detect(c);
+
+       if (x86_match_cpu(tdx_pw_mce_cpu_ids))
+               setup_force_cpu_bug(X86_BUG_TDX_PW_MCE);
 }