On Wed, Nov 06, 2024, Kai Huang wrote: > On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote: > > On Thu, Oct 31, 2024, Kai Huang wrote: > > > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote: > > > > > +void __init tdx_bringup(void) > > > > > +{ > > > > > + enable_tdx = enable_tdx && !__tdx_bringup(); > > > > > > > > Ah. I don't love this approach because it mixes "failure" due to an unsupported > > > > configuration, with failure due to unexpected issues. E.g. if enabling virtualization > > > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX. > > > > > > Thanks for the comments. > > > > > > I see your point. However for "enabling virtualization failure" kvm_init() will > > > also try to do (default behaviour), so if it fails it will result in module > > > loading failure eventually. So while I guess it would be slightly better to > > > make module loading fail if "enabling virtualization fails" in TDX, it is a nit > > > issue to me. > > > > > > I think "enabling virtualization failure" is the only "unexpected issue" that > > > should result in module loading failure. For any other TDX-specific > > > initialization failure (e.g., any memory allocation in future patches) it's > > > better to only disable TDX. > > > > I disagree. The platform owner wants TDX to be enabled, KVM shouldn't silently > > disable TDX because of a transient, unrelated failure. > > > > If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly > > disable, then that's different. And that's the general pattern throughout KVM. > > If a requested feature isn't supported, then KVM continues on updates the module > > param accordingly. But if something outright fails during setup, KVM aborts the > > entire sequence. > > > > > So I can change to "make loading KVM-the-module fail if enabling virtualization > > > fails in TDX", but I want to confirm this is what you want? > > > > I would prefer the logic to be: reject loading kvm-intel.ko if an operation that > > would normally succeed, fails. > > I looked at the final tdx.c that in our development branch [*], and below is the > list of the things that need to be done to init TDX (the code in > __tdx_bringup()), and my thinking of whether to fail loading the module or just > disable TDX: > > 1) Early dependency check fails. Those include: tdp_mmu_enabled, > enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of > TSX_CTL uret MSR. > > For those we can disable TDX only but continue to load module. > > 2) Enable virtualization fails. > > For this we fail to load module (as you suggested). > > 3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug. > > For this we only disable TDX but continue to load module. The reason is I think > this is similar to enable a specific KVM feature but the hardware doesn't > support it. We can go further to check the return value of tdx_cpu_enable() to > distinguish cases like "module not loaded" and "unexpected error", but I really > don't want to go that far. Hrm, tdx_cpu_enable() is a bit of a mess. Ideally, there would be a separate "probe" API so that KVM could detect if TDX is supported. Though maybe it's the TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to detect whether or not a module is loaded. So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if it returns -ENODEV, otherwise fail the module load? > 4) tdx_enable() fails. > > Ditto to 3). No, this should fail the module load. E.g. most of the error conditions are -ENOMEM, which has nothing to do with host support for TDX. > 5) tdx_get_sysinfo() fails. > > This is a kernel bug since tdx_get_sysinfo() should always return valid TDX > sysinfo structure pointer after tdx_enable() is done successfully. Currently we > just WARN() if the returned pointer is NULL and disable TDX only. I think it's > also fine. > > 6) TDX global metadata check fails, e.g., MAX_VCPUS etc. > > Ditto to 3). For this we disable TDX only. Where is this code?