On Mon, 2023-11-27 at 19:33 +0000, Huang, Kai wrote: > On Mon, 2023-11-27 at 10:13 -0800, Hansen, Dave wrote: > > On 11/9/23 03:55, Kai Huang wrote: > > ... > > > --- a/arch/x86/kernel/reboot.c > > > +++ b/arch/x86/kernel/reboot.c > > > @@ -31,6 +31,7 @@ > > > #include <asm/realmode.h> > > > #include <asm/x86_init.h> > > > #include <asm/efi.h> > > > +#include <asm/tdx.h> > > > > > > /* > > > * Power off function, if any > > > @@ -741,6 +742,20 @@ void native_machine_shutdown(void) > > > local_irq_disable(); > > > stop_other_cpus(); > > > #endif > > > + /* > > > + * stop_other_cpus() has flushed all dirty cachelines of TDX > > > + * private memory on remote cpus. Unlike SME, which does the > > > + * cache flush on _this_ cpu in the relocate_kernel(), flush > > > + * the cache for _this_ cpu here. This is because on the > > > + * platforms with "partial write machine check" erratum the > > > + * kernel needs to convert all TDX private pages back to normal > > > + * before booting to the new kernel in kexec(), and the cache > > > + * flush must be done before that. If the kernel took SME's way, > > > + * it would have to muck with the relocate_kernel() assembly to > > > + * do memory conversion. > > > + */ > > > + if (platform_tdx_enabled()) > > > + native_wbinvd(); > > > > Why can't the TDX host code just set host_mem_enc_active=1? > > > > Sure, you'll end up *using* the SME WBINVD support, but then you don't > > have two different WBINVD call sites. You also don't have to mess with > > a single line of assembly. > > I wanted to avoid changing the assembly. > > Perhaps the comment isn't very clear. Flushing cache (on the CPU running kexec) > when the host_mem_enc_active=1 is handled in the relocate_kernel() assembly, > which happens at very late stage right before jumping to the new kernel. > However for TDX when the platform has erratum we need to convert TDX private > pages back to normal, which must be done after flushing cache. If we reuse > host_mem_enc_active=1, then we will need to change the assembly code to do that. > Forgot to say doing TDX page conversion in the relocate_assembly() isn't easy because the cache flushing when host_mem_enc_active=1 happens after kernel has switched to the identity mapping table, so we will need to do hacks like fixing up symbol address etc.