On Tue, Jun 11, 2024 at 11:26:17AM -0700, H. Peter Anvin wrote: > On 6/4/24 08:21, Kirill A. Shutemov wrote: > > > > From b45fe48092abad2612c2bafbb199e4de80c99545 Mon Sep 17 00:00:00 2001 > > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > Date: Fri, 10 Feb 2023 12:53:11 +0300 > > Subject: [PATCHv11.1 06/19] x86/kexec: Keep CR4.MCE set during kexec for TDX guest > > > > TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If > > that bit is cleared during CR4 register reprogramming during boot or > > kexec flows, a #VE exception will be raised which the guest kernel > > cannot handle it. > > > > Therefore, make sure the CR4.MCE setting is preserved over kexec too and > > avoid raising any #VEs. > > > > The change doesn't affect non-TDX-guest environments. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > --- > > arch/x86/kernel/relocate_kernel_64.S | 17 ++++++++++------- > > 1 file changed, 10 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S > > index 085eef5c3904..9c2cf70c5f54 100644 > > --- a/arch/x86/kernel/relocate_kernel_64.S > > +++ b/arch/x86/kernel/relocate_kernel_64.S > > @@ -5,6 +5,8 @@ > > */ > > #include <linux/linkage.h> > > +#include <linux/stringify.h> > > +#include <asm/alternative.h> > > #include <asm/page_types.h> > > #include <asm/kexec.h> > > #include <asm/processor-flags.h> > > @@ -145,14 +147,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped) > > * Set cr4 to a known state: > > * - physical address extension enabled > > * - 5-level paging, if it was enabled before > > + * - Machine check exception on TDX guest, if it was enabled before. > > + * Clearing MCE might not be allowed in TDX guests, depending on setup. > > + * > > + * Use R13 that contains the original CR4 value, read in relocate_kernel(). > > + * PAE is always set in the original CR4. > > */ > > - movl $X86_CR4_PAE, %eax > > - testq $X86_CR4_LA57, %r13 > > - jz .Lno_la57 > > - orl $X86_CR4_LA57, %eax > > -.Lno_la57: > > - > > - movq %rax, %cr4 > > + andl $(X86_CR4_PAE | X86_CR4_LA57), %r13d > > + ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %r13d), X86_FEATURE_TDX_GUEST > > + movq %r13, %cr4 > > If this is the case, I don't really see a reason to clear MCE per se as I'm > guessing a machine check here will be fatal anyway? It just changes the > method of death. Andrew had a strong opinion on method of death here. https://lore.kernel.org/all/1144340e-dd95-ee3b-dabb-579f9a65b3c7@xxxxxxxxxx > Also, is there a reason to save %cr4, run code, and *then* clear the > relevant bits? Wouldn't it be better to sanitize %cr4 as soon as possible? You mean set new CR4 directly in relocate_kernel() before switching CR3? I guess it is possible. But I can say I see huge benefit of changing it. Such change would have own risks. -- Kiryl Shutsemau / Kirill A. Shutemov