On Tue, 2024-04-23 at 08:15 -0700, Sean Christopherson wrote: > On Tue, Apr 23, 2024, Kai Huang wrote: > > On Tue, 2024-04-23 at 13:34 +1200, Kai Huang wrote: > > > > > > > > > > And the intent isn't to catch every possible problem. As with many sanity checks, > > > > > > the intent is to detect the most likely failure mode to make triaging and debugging > > > > > > issues a bit easier. > > > > > > > > > > The SEAMCALL will literally return a unique error code to indicate CPU > > > > > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done. I think the > > > > > error code is already clear to pinpoint the problem (due to these pre- > > > > > SEAMCALL-condition not being met). > > > > > > > > No, SEAMCALL #UDs if the CPU isn't post-VMXON. I.e. the CPU doesn't make it to > > > > the TDX Module to provide a unique error code, all KVM will see is a #UD. > > > > > > #UD is handled by the SEAMCALL assembly code. Please see TDX_MODULE_CALL > > > assembly macro: > > Right, but that doesn't say why the #UD occurred. The macro dresses it up in > TDX_SW_ERROR so that KVM only needs a single parser, but at the end of the day > KVM is still only going to see that SEAMCALL hit a #UD. Right. But is there any problem here? I thought the point was we can just use the error code to tell what went wrong. > > > > > There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely > > > > no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking > > > > tdx_enable(). I suspect they got that way because of KVM's unnecessarily complex > > > > code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough > > > > to also do TDH_SYS_LP_INIT, so why do two IPIs? > > > > > > The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX > > > module initialization. > > > > > > Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the > > > platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before > > > any other SEAMCALLs. This didn't quite work with (kernel software) CPU > > > hotplug, and it had problem dealing with things like SMT disable > > > mitigation: > > > > > > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f > > > > > > So the x86 maintainers requested to change this. The original proposal > > > was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT: > > > > > > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469 > > > > > > But somehow it wasn't feasible, and the result was we relaxed to allow > > > TDH.SYS.LP.INIT to be called after module initialization. > > > > > > So we need a separate tdx_cpu_enable() for that. > > No, you don't, at least not given the TDX patches I'm looking at. Allowing > TDH.SYS.LP.INIT after module initialization makes sense because otherwise the > kernel would need to online all possible CPUs before initializing TDX. But that > doesn't mean that the kernel needs to, or should, punt TDH.SYS.LP.INIT to KVM. > > AFAICT, KVM is NOT doing TDH.SYS.LP.INIT when a CPU is onlined, only when KVM > is loaded, which means that tdx_enable() can process all online CPUs just as > easily as KVM. Hmm.. I assumed kvm_online_cpu() will do VMXON + tdx_cpu_enable(). > > Presumably that approach relies on something blocking onlining CPUs when TDX is > active. And if that's not the case, the proposed patches are buggy. The current patch ([PATCH 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module) indeed is buggy, but I don't quite follow why we need to block onlining CPU when TDX is active? There's no hard things that prevent us to do so. KVM just need to do VMXON + tdx_cpu_enable() inside kvm_online_cpu(). > > > Btw, the ideal (or probably the final) plan is to handle tdx_cpu_enable() > > in TDX's own CPU hotplug callback in the core-kernel and hide it from all > > other in-kernel TDX users. > > > > Specifically: > > > > 1) that callback, e.g., tdx_online_cpu() will be placed _before_ any in- > > kernel TDX users like KVM's callback. > > 2) In tdx_online_cpu(), we do VMXON + tdx_cpu_enable() + VMXOFF, and > > return error in case of any error to prevent that cpu from going online. > > > > That makes sure that, if TDX is supported by the platform, we basically > > guarantees all online CPUs are ready to issue SEAMCALL (of course, the in- > > kernel TDX user still needs to do VMXON for it, but that's TDX user's > > responsibility). > > > > But that obviously needs to move VMXON to the core-kernel. > > It doesn't strictly have to be core kernel per se, just in code that sits below > KVM, e.g. in a seperate module called VAC[*] ;-) > > [*] https://lore.kernel.org/all/ZW6FRBnOwYV-UCkY@xxxxxxxxxx Could you elaborate why vac.ko is necessary? Being a module natually we will need to handle module init and exit. But TDX cannot be disabled and re-enabled after initialization, so in general the vac.ko doesn't quite fit for TDX. And I am not sure what's the fundamental difference between managing TDX module in a module vs in the core-kernel from KVM's perspective. > > > Currently, export tdx_cpu_enable() as a separate API and require KVM to > > call it explicitly is a temporary solution. > > > > That being said, we could do tdx_cpu_enable() inside tdx_enable(), but I > > don't see it's a better idea. > > It simplifies the API surface for enabling TDX and eliminates an export. I was surprised to see we want to prevent onlining CPU when TDX is active.