On 3/25/23 12:25, Kirill A. Shutemov wrote: > On Sat, Mar 25, 2023 at 09:25:36AM -0700, Dave Hansen wrote: >> On 3/25/23 09:01, Kirill A. Shutemov wrote: >>> The last item is tricky. TDX guests use ACPI MADT MPWK to bring up >>> secondary CPUs. The mechanism doesn't allow to put a CPU back offline if >>> it has woken up. >> ... >>> +int arch_kexec_load(void) >>> +{ >>> + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { >>> + pr_warn_once("Disable kexec: not yet supported in TDX guest\n"); >>> + return -EOPNOTSUPP; >>> + } >>> + >>> + return 0; >>> +} >> >> So, let's put all this together: >> >> 1. TDX implementations use MADT for wakeup exclusively right now (but >> are not necessarily _required_ to do so forever) >> 2. MADT doesn't support CPU offlining >> 3. kexec() requires offlining >> >> Thus, current TDX implementations can't support TDX guests. This >> *doesn't* say that TDX will always use the MADT for wakeups. >> >> Yet, the check you have here is for TDX and *not* for the MADT. > > As I described in the commit message there are more than MADT that is > required to get kexec in TDX guest. I kinda think we should do both. Let's make sure that all systems that depend on MADT wakeups can't kexec() until the ACPI folks work out what to do there. Separately, let's either fix or *mark* the kexec()-incompatible pieces that *ARE* specific to TDX. >> That seems wrong. >> >> Let's say SEV or arm64 comes along and uses the MADT for their guests. >> They'll add another arch_kexec_load(), with a check for *their* feature. >> >> This all seems like you should be disabling kexec() the moment the MADT >> CPU wakeup is used instead of making it based on TDX. > > I guess we can go this path if you are fine with taking CR4.MCE and shared > memory reverting patches (they require some rework, but I can get them > into shape quickly). After that we can forbid kexec on machines with MADT > if nr_cpus > 1. This goes back to what I asked before: is anyone actually going to *use* a single-processor system that wants to kexec()? If not, let's not waste the time to introduce code that is just going to bitrot. Just mark it broken and move on with life. I'm also a _bit_ curious what the implications of the CR4.MCE preservation are. IIRC, systems are quite a bit less stable when CR4.MCE==0. So, maybe there are some benefits to leaving it set during kexec() for everyone. But, that probably also involves having a #MC handler in the mix and that's obviously trouble while we're switching kernels and messing around in the guts of things like paging configuration. The CR4.MCE change is simple enough, but it also _completely_ glossed over the implications to non-TDX users. I guess the overall message here is: please try to think of the big picture a _little_ outside of the TDX world. This patch's approach is a bit myopic. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec