> > > > And the intent isn't to catch every possible problem. As with many sanity checks, > > > the intent is to detect the most likely failure mode to make triaging and debugging > > > issues a bit easier. > > > > The SEAMCALL will literally return a unique error code to indicate CPU > > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done. I think the > > error code is already clear to pinpoint the problem (due to these pre- > > SEAMCALL-condition not being met). > > No, SEAMCALL #UDs if the CPU isn't post-VMXON. I.e. the CPU doesn't make it to > the TDX Module to provide a unique error code, all KVM will see is a #UD. #UD is handled by the SEAMCALL assembly code. Please see TDX_MODULE_CALL assembly macro: .Lseamcall_trap\@: /* * SEAMCALL caused #GP or #UD. By reaching here RAX contains * the trap number. Convert the trap number to the TDX error * code by setting TDX_SW_ERROR to the high 32-bits of RAX. * * Note cannot OR TDX_SW_ERROR directly to RAX as OR instruction * only accepts 32-bit immediate at most. */ movq $TDX_SW_ERROR, %rdi orq %rdi, %rax ... _ASM_EXTABLE_FAULT(.Lseamcall\@, .Lseamcall_trap\@) .endif /* \host */ > > > > > Btw, I noticed there's another problem, that is currently tdx_cpu_enable() > > > > actually requires IRQ being disabled. Again it was implemented based on > > > > it would be invoked via both on_each_cpu() and kvm_online_cpu(). > > > > > > > > It also also implemented with consideration that it could be called by > > > > multiple in-kernel TDX users in parallel via both SMP call and in normal > > > > context, so it was implemented to simply request the caller to make sure > > > > it is called with IRQ disabled so it can be IRQ safe (it uses a percpu > > > > variable to track whether TDH.SYS.LP.INIT has been done for local cpu > > > > similar to the hardware_enabled percpu variable). > > > > > > Is this is an actual problem, or is it just something that would need to be > > > updated in the TDX code to handle the change in direction? > > > > For now this isn't, because KVM is the solo user, and in KVM > > hardware_enable_all() and kvm_online_cpu() uses kvm_lock mutex to make > > hardware_enable_nolock() IPI safe. > > > > I am not sure how TDX/SEAMCALL will be used in TDX Connect. > > > > However I needed to consider KVM as a user, so I decided to just make it > > must be called with IRQ disabled so I could know it is IRQ safe. > > > > Back to the current tdx_enable() and tdx_cpu_enable(), my personal > > preference is, of course, to keep the existing way, that is: > > > > During module load: > > > > cpus_read_lock(); > > tdx_enable(); > > cpus_read_unlock(); > > > > and in kvm_online_cpu(): > > > > local_irq_save(); > > tdx_cpu_enable(); > > local_irq_restore(); > > > > But given KVM is the solo user now, I am also fine to change if you > > believe this is not acceptable. > > Looking more closely at the code, tdx_enable() needs to be called under > cpu_hotplug_lock to prevent *unplug*, i.e. to prevent the last CPU on a package > from being offlined. I.e. that part's not option. Yeah. We can say that. I almost forgot this :-) > > And the root of the problem/confusion is that the APIs provided by the core kernel > are weird, which is really just a polite way of saying they are awful :-) Well, apologize for it :-) > > There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely > no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking > tdx_enable(). I suspect they got that way because of KVM's unnecessarily complex > code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough > to also do TDH_SYS_LP_INIT, so why do two IPIs? The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX module initialization. Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before any other SEAMCALLs. This didn't quite work with (kernel software) CPU hotplug, and it had problem dealing with things like SMT disable mitigation: https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f So the x86 maintainers requested to change this. The original proposal was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT: https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469 But somehow it wasn't feasible, and the result was we relaxed to allow TDH.SYS.LP.INIT to be called after module initialization. So we need a separate tdx_cpu_enable() for that. > > But just because KVM "owns" VMXON doesn't mean the core kernel code should punt > TDX to KVM too. If KVM relies on the cpuhp code to ensure all online CPUs are > post-VMXON, then the TDX shapes up nicely and provides a single API to enable > TDX. > We could ask tdx_enable() to invoke tdx_cpu_enable() internally, but as mentioned above we still to have the tdx_cpu_enable() as a separate API to allow CPU hotplug to call it. > And then my CR4.VMXE=1 sanity check makes a _lot_ more sense. As replied above the current SEAMCALL assembly returns a unique error code for that: #define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)