On Mon, Jun 05, 2023 at 02:27:30AM +1200, Kai Huang wrote: > There are two problems in terms of using kexec() to boot to a new kernel > when the old kernel has enabled TDX: 1) Part of the memory pages are > still TDX private pages; 2) There might be dirty cachelines associated > with TDX private pages. > > The first problem doesn't matter on the platforms w/o the "partial write > machine check" erratum. KeyID 0 doesn't have integrity check. If the > new kernel wants to use any non-zero KeyID, it needs to convert the > memory to that KeyID and such conversion would work from any KeyID. > > However the old kernel needs to guarantee there's no dirty cacheline > left behind before booting to the new kernel to avoid silent corruption > from later cacheline writeback (Intel hardware doesn't guarantee cache > coherency across different KeyIDs). > > There are two things that the old kernel needs to do to achieve that: > > 1) Stop accessing TDX private memory mappings: > a. Stop making TDX module SEAMCALLs (TDX global KeyID); > b. Stop TDX guests from running (per-guest TDX KeyID). > 2) Flush any cachelines from previous TDX private KeyID writes. > > For 2), use wbinvd() to flush cache in stop_this_cpu(), following SME > support. And in this way 1) happens for free as there's no TDX activity > between wbinvd() and the native_halt(). > > Flushing cache in stop_this_cpu() only flushes cache on remote cpus. On > the cpu which does kexec(), unlike SME which does the cache flush in > relocate_kernel(), do the cache flush right after stopping remote cpus > in machine_shutdown(). This is because on the platforms with above > erratum, the kernel needs to convert all TDX private pages back to > normal before a fast warm reset reboot or booting to the new kernel in > kexec(). Flushing cache in relocate_kernel() only covers the kexec() > but not the fast warm reset reboot. > > Theoretically, cache flush is only needed when the TDX module has been > initialized. However initializing the TDX module is done on demand at > runtime, and it takes a mutex to read the module status. Just check > whether TDX is enabled by the BIOS instead to flush cache. > > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx> > Reviewed-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> -- Kiryl Shutsemau / Kirill A. Shutemov