On Fri, 2024-04-26 at 11:21 +0800, Gao, Chao wrote: > On Fri, Apr 26, 2024 at 12:21:46AM +0000, Huang, Kai wrote: > > > > > > > > > > The important thing is that they're handled by _one_ entity. What we have today > > > > > is probably the worst setup; VMXON is handled by KVM, but TDX.SYS.LP.INIT is > > > > > handled by core kernel (sort of). > > > > > > > > I cannot argue against this :-) > > > > > > > > But from this point of view, I cannot see difference between tdx_enable() > > > > and tdx_cpu_enable(), because they both in core-kernel while depend on KVM > > > > to handle VMXON. > > > > > > My comments were made under the assumption that the code was NOT buggy, i.e. if > > > KVM did NOT need to call tdx_cpu_enable() independent of tdx_enable(). > > > > > > That said, I do think it makes to have tdx_enable() call an private/inner version, > > > e.g. __tdx_cpu_enable(), and then have KVM call a public version. Alternatively, > > > the kernel could register yet another cpuhp hook that runs after KVM's, i.e. does > > > TDX.SYS.LP.INIT after KVM has done VMXON (if TDX has been enabled). > > > > We will need to handle tdx_cpu_online() in "some cpuhp callback" anyway, > > no matter whether tdx_enable() calls __tdx_cpu_enable() internally or not, > > because now tdx_enable() can be done on a subset of cpus that the platform > > has. > > Can you confirm this is allowed again? it seems like this code indicates the > opposite: > > https://github.com/intel/tdx-module/blob/tdx_1.5/src/vmm_dispatcher/api_calls/tdh_sys_config.c#L768C1-L775C6 This feature requires ucode/P-SEAMLDR and TDX module change, and cannot be supported for some *early* generations. I think they haven't added such code to the opensource TDX module code yet. I can ask TDX module people's plan if it is a concern. In reality, this shouldn't be a problem because the current code kinda works with both cases: 1) If this feature is not supported (i.e., old platform and/or old module), and if user tries to enable TDX when there's offline cpu, then tdx_enable() will fail when it does TDH.SYS.CONFIG, and we can use the error code to pinpoint the root cause. 2) Otherwise, it just works. > > > > > For the latter (after the "Alternatively" above), by "the kernel" do you > > mean the core-kernel but not KVM? > > > > E.g., you mean to register a cpuhp book _inside_ tdx_enable() after TDX is > > initialized successfully? > > > > That would have problem like when KVM is not present (e.g., KVM is > > unloaded after it enables TDX), the cpuhp book won't work at all. > > Is "the cpuhp hook doesn't work if KVM is not loaded" a real problem? > > The CPU about to online won't run any TDX code. So, it should be ok to > skip tdx_cpu_enable(). It _can_ work if we only consider KVM, because for KVM we can always guarantee: 1) VMXON + tdx_cpu_enable() have been done for all online cpus before it calls tdx_enable(). 2) VMXON + tdx_cpu_enable() have been done in cpuhp for any new CPU before it goes online. Btw, this reminds me why I didn't want to do tdx_cpu_enable() inside tdx_enable(): tdx_enable() will need to _always_ call tdx_cpu_enable() for all online cpus regardless of whether the module has been initialized successfully in the previous calls. I believed this is kinda silly, i.e., why not just letting the caller to do tdx_cpu_enable() for all online cpus before tdx_enable(). However, back to the TDX-specific core-kernel cpuhp hook, in the long term, I believe the TDX cpuhp hook should be put _BEFORE_ all in-kernel TDX-users' cpuhp hooks, because logically TDX users should depend on TDX core-kernel code, but not the opposite. That is, my long term vision is we can have a simple rule: The core-kernel TDX code always guarantees online CPUs are TDX-capable. All TDX users don't need to consider tdx_cpu_enable() ever. They just need to call tdx_enable() to bring TDX to work. So for now, given we depend on KVM for VMXON anyway, I don't see any reason the core-kernel should register any TDX cpuhp. Having to "skip tdx_cpu_enable() when VMX isn't enabled" is kinda hacky anyway.