Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 10, 2023 at 12:55:59AM +1300, Kai Huang wrote:
> +static const char *mce_memory_info(struct mce *m)
> +{
> +	if (!m || !mce_is_memory_error(m) || !mce_usable_address(m))
> +		return NULL;
> +
> +	/*
> +	 * Certain initial generations of TDX-capable CPUs have an
> +	 * erratum.  A kernel non-temporal partial write to TDX private
> +	 * memory poisons that memory, and a subsequent read of that
> +	 * memory triggers #MC.
> +	 *
> +	 * However such #MC caused by software cannot be distinguished
> +	 * from the real hardware #MC.  Just print additional message
> +	 * to show such #MC may be result of the CPU erratum.
> +	 */
> +	if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE))
> +		return NULL;
> +
> +	return !tdx_is_private_mem(m->addr) ? NULL :
> +		"TDX private memory error. Possible kernel bug.";
> +}
> +
>  static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
>  {
>  	struct llist_node *pending;
>  	struct mce_evt_llist *l;
>  	int apei_err = 0;
> +	const char *memmsg;
>  
>  	/*
>  	 * Allow instrumentation around external facilities usage. Not that it
> @@ -283,6 +307,15 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
>  	}
>  	if (exp)
>  		pr_emerg(HW_ERR "Machine check: %s\n", exp);
> +	/*
> +	 * Confidential computing platforms such as TDX platforms
> +	 * may occur MCE due to incorrect access to confidential
> +	 * memory.  Print additional information for such error.
> +	 */
> +	memmsg = mce_memory_info(final);
> +	if (memmsg)
> +		pr_emerg(HW_ERR "Machine check: %s\n", memmsg);
> +

No, this is not how this is done. First of all, this function should be
called something like

	mce_dump_aux_info()

or so to state that it is dumping some auxiliary info.

Then, it does:

	if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
		return tdx_get_mce_info();

or so and you put that tdx_get_mce_info() function in TDX code and there
you do all your picking apart of things, what needs to be dumped or what
not, checking whether it is a memory error and so on.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux