On 8/11/2024 11:04 am, Sean Christopherson wrote:
On Wed, Nov 06, 2024, Kai Huang wrote:
On Wed, 2024-11-06 at 07:01 -0800, Sean Christopherson wrote:
On Wed, Nov 06, 2024, Kai Huang wrote:
For this we only disable TDX but continue to load module. The reason is I think
this is similar to enable a specific KVM feature but the hardware doesn't
support it. We can go further to check the return value of tdx_cpu_enable() to
distinguish cases like "module not loaded" and "unexpected error", but I really
don't want to go that far.
Hrm, tdx_cpu_enable() is a bit of a mess. Ideally, there would be a separate
"probe" API so that KVM could detect if TDX is supported. Though maybe it's the
TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
detect whether or not a module is loaded.
We can also use P-SEAMLDR SEAMCALL to query, but I see no difference between
using TDH_SYS_INIT. If you are asking whether there's CPUID or MSR to query
then no.
Doesn't have to be a CPUID or MSR, anything idempotent would work. Which begs
the question, is that P-SEAMLDR SEAMCALL query you have in mind idempotent? :-)
It is the SEAMLDR.INFO SEAMCALL, which writes bunch of information to a
SEAMLDR_INFO structure. There's one bit "SEAM_READY" which (when true)
indicates the TDX module is ready for SEAMCALL, i.e., the module is loaded.
And yes it is idempotent I believe, even we consider TDX module runtime
reload/update.
The problem is it is a SEAMCALL, and being a SEAMCALL requires all
things like enabling virtualization first and adding another wrapper API
and structure definition to do SEAMLDR.INFO (and we are still in
discussion how to export SEAMCALLs to let KVM make).
From this perspective, I don't see a big difference between using
SEAMLDR.INFO and tdx_cpu_enable() for probing TDX.
I agree we can change to use SEAMLDR.INFO to detect in the long term
after we move VMXON out of KVM, though, because we can get a lot more
information with that besides whether module is loaded. But before
that, I see no big difference.
So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
it returns -ENODEV, otherwise fail the module load?
We can, but we need to assume cpuhp_setup_state_cpuslocked() itself will not
return -ENODEV (it is true now), otherwise we won't be able to distinguish
whether the -ENODEV was from cpuhp_setup_state_cpuslocked() or tdx_cpu_enable().
Unless we choose to do tdx_cpu_enable() via on_each_cpu() separately.
Btw tdx_cpu_enable() itself will print "module not loaded" in case of -ENODEV,
so the user will be aware anyway if we only disable TDX but not fail module
loading.
That only helps if a human looks at dmesg before attempting to run a TDX VM on
the host, and parsing dmesg to treat that particular scenario as fatal isn't
something I want to recommend to end users. E.g. if our platform configuration
screwed up and failed to load a TDX module, then I want that to be surfaced as
an alarm of sorts, not a silent "this platform doesn't support TDX" flag.
My concern is still the whole "different handling of error cases" seems over-
engineering.
IMO, that's a symptom of the TDX enabling code not cleanly separating "probe"
from "enable", and at a glance, that seems very solvable.
I am not so sure about this at this stage, because you need to make a
SEAMCALL anyway for that. :-)
I think we can document this imperfection for now and enhance after
moving VMXON out of KVM.
Btw we are going to add P-SEAMLDR SEAMCALLs to support TDX runtime
reload/update anyway, so we can add SEAMLDR.INFO there (or before that...).
And I suspect that
cleaning things up will allow for additional hardening. E.g. I assume the lack
of MOVDIR64B should be a WARN, but because KVM checks for MOVDIR64B before
checking for basic TDX support, it's an non-commitalpr_warn().
Yeah if we check TDX_HOST_PLATFORM first then we can WARN() on MOVDIR64B.
4) tdx_enable() fails.
Ditto to 3).
No, this should fail the module load. E.g. most of the error conditions are
-ENOMEM, which has nothing to do with host support for TDX.
5) tdx_get_sysinfo() fails.
This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
sysinfo structure pointer after tdx_enable() is done successfully. Currently we
just WARN() if the returned pointer is NULL and disable TDX only. I think it's
also fine.
6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
Ditto to 3). For this we disable TDX only.
Where is this code?
Please check:
https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c
.. starting at line 3320.
Before I forget, that code has a bug. This
/* Check TDX module and KVM capabilities */
if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) ||
!tdx_get_supported_xfam(&tdx_sysinfo->td_conf))
goto get_sysinfo_err;
will return '0' on error, instead of -EINVAL (or whatever it intends).
Indeed. Thanks for catching this. I'll report to Xiaoyao.
Back to the main discussion, these checks are obvious "probe" failures and so
should disable TDX without failing module load:
if (!tdp_mmu_enabled || !enable_mmio_caching)
return -EOPNOTSUPP;
if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) {
pr_warn("MOVDIR64B is reqiured for TDX\n");
return -EOPNOTSUPP;
}
Yeah sure.
A kvm_find_user_return_msr() error is obviously a KVM bug, i.e. should definitely
WARN and fail module module. Ditto for kvm_enable_virtualization().
OK agreed.
The boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM) that's buried in tdx_enable()
really belongs in KVM. Having it in both is totally fine, but KVM shouldn't do
a bunch of work and _then_ check if all that work was pointless.
Fine to me.
I am ok treating everything at or after tdx_get_sysinfo() as fatal to module load,
especially since, IIUC, TD_SYS_INIT can't be undone, i.e. KVM has crossed a point
of no return.
In short, assuming KVM can query if a TDX module is a loaded, I don't think it's
all that much work to do:
static bool kvm_is_tdx_supported(void)
{
if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
return false;
if (!<is TDX module loaded>)
return false;
if (!tdp_mmu_enabled || !enable_mmio_caching)
return false;
if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)))
return false;
return true;
}
int __init tdx_bringup(void)
{
enable_tdx = enable_tdx && kvm_is_tdx_supported();
if (!enable_tdx)
return 0;
return __tdx_bringup();
}
Thanks for clarifying this.
As mentioned above, I would say we just use tdx_cpu_enable() to "probe"
whether module is present before moving VMXON out of KVM. We can
document this imperfection for now and revisit later.
How about:
static bool kvm_can_support_tdx(void)
{
if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
return false;
if (!tdp_mmu_enabled || !enable_mmio_caching)
return false;
if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)))
return false;
}
int __init tdx_bringup(void)
{
int r;
enable_tdx = enable_tdx && kvm_can_support_tdx();
if (!enable_tdx)
return 0;
/*
* Ideally KVM should probe whether TDX module has been loaded
* first and then try to bring it up, because KVM should treat
* them differently. I.e., KVM should just disable TDX while
* still allow module to be loaded when TDX module is not
* loaded, but fail to load module at all when fail to bring up
* TDX.
*
* But unfortunately TDX needs to use SEAMCALL to probe whether
* the module is loaded (there is not CPUID or MSR for that),
* and making SEAMCALL requires enabling virtualization first (
* like the rest steps of bringing up TDX module).
*
* For simplicity, do the probing and bringing up together for
* now.
*
* Note the first SEAMCALL to bringing up TDX will return
* -ENODEV when module is not loaded, and this serves the probe
* albeit it is not perfect.
*
* Another option is using P-SEAMLDR's SEAMLDR.INFO SEAMCALL to
* probe, but it is still a SEAMCALL. Currently kernel doesn't
* support P-SEAMLDR SEAMCALLs so don't bother to add it just
* for probing TDX module.
*
* Again, this is not perfect, and can be revisited once VMXON
* is moved to the core-kernel.
*/
r = __tdx_probe_and_bringup();
if (r) {
enable_tdx = 0;
/*
* Disable TDX only but don't fail to load module when
* TDX module is not loaded. No need to print error
* message since __tdx_probe_and_bringup() already did
* that in this case.
*/
if (r == -ENODEV)
r = 0;
}
return r;
}