On 4/5/22 21:49, Kai Huang wrote: > The TDX module is essentially a CPU-attested software module running > in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious > host and certain physical attacks. The TDX module implements the > functions to build, tear down and start execution of the protected VMs > called Trusted Domains (TD). Before the TDX module can be used to > create and run TD guests, it must be loaded into the SEAM Range Register > (SEAMRR) and properly initialized. The module isn't loaded into a register, right? It's loaded into a memory area pointed to *by* the register. > The TDX module is expected to be > loaded by BIOS before booting to the kernel, and the kernel is expected > to detect and initialize it, using the SEAMCALLs defined by TDX > architecture. Wait a sec... So, what was all this gobleygook about TDX module loading and SEAMRR's if the kernel just has the TDX module *handed* to it already loaded? It looks to me like you wrote all of this before the TDX module was being loaded by the BIOS and neglected to go and update these changelogs. > The TDX module can be initialized only once in its lifetime. Instead > of always initializing it at boot time, this implementation chooses an > on-demand approach to initialize TDX until there is a real need (e.g > when requested by KVM). This avoids consuming the memory that must be > allocated by kernel and given to the TDX module as metadata (~1/256th of > the TDX-usable memory), and also saves the time of initializing the TDX > module (and the metadata) when TDX is not used at all. Initializing the > TDX module at runtime on-demand also is more flexible to support TDX > module runtime updating in the future (after updating the TDX module, it > needs to be initialized again). > > Introduce two placeholders tdx_detect() and tdx_init() to detect and > initialize the TDX module on demand, with a state machine introduced to > orchestrate the entire process (in case of multiple callers). > > To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs. The > TDX module is reported as not loaded if either SEAMRR is not enabled, or > there are no enough TDX private KeyIDs to create any TD guest. The TDX > module itself requires one global TDX private KeyID to crypto protect > its metadata. This is stepping over the line into telling me what the code does instead of why. > And tdx_init() is currently empty. The TDX module will be initialized > in multi-steps defined by the TDX architecture: > > 1) Global initialization; > 2) Logical-CPU scope initialization; > 3) Enumerate the TDX module capabilities and platform configuration; > 4) Configure the TDX module about usable memory ranges and global > KeyID information; > 5) Package-scope configuration for the global KeyID; > 6) Initialize usable memory ranges based on 4). > > The TDX module can also be shut down at any time during its lifetime. > In case of any error during the initialization process, shut down the > module. It's pointless to leave the module in any intermediate state > during the initialization. > > SEAMCALL requires SEAMRR being enabled and CPU being already in VMX > operation (VMXON has been done), otherwise it generates #UD. So far > only KVM handles VMXON/VMXOFF. Choose to not handle VMXON/VMXOFF in > tdx_detect() and tdx_init() but depend on the caller to guarantee that, > since so far KVM is the only user of TDX. In the long term, more kernel > components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX > module runtime update), so a reference-based approach to do VMXON/VMXOFF > is likely needed. > > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx> > --- > arch/x86/include/asm/tdx.h | 4 + > arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 226 insertions(+) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index 1f29813b1646..c8af2ba6bb8a 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, > > #ifdef CONFIG_INTEL_TDX_HOST > void tdx_detect_cpu(struct cpuinfo_x86 *c); > +int tdx_detect(void); > +int tdx_init(void); > #else > static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { } > +static inline int tdx_detect(void) { return -ENODEV; } > +static inline int tdx_init(void) { return -ENODEV; } > #endif /* CONFIG_INTEL_TDX_HOST */ > > #endif /* !__ASSEMBLY__ */ > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index ba2210001ea8..53093d4ad458 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -9,6 +9,8 @@ > > #include <linux/types.h> > #include <linux/cpumask.h> > +#include <linux/mutex.h> > +#include <linux/cpu.h> > #include <asm/msr-index.h> > #include <asm/msr.h> > #include <asm/cpufeature.h> > @@ -45,12 +47,33 @@ > ((u32)(((_keyid_part) & 0xffffffffull) + 1)) > #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32)) > > +/* > + * TDX module status during initialization > + */ > +enum tdx_module_status_t { > + /* TDX module status is unknown */ > + TDX_MODULE_UNKNOWN, > + /* TDX module is not loaded */ > + TDX_MODULE_NONE, > + /* TDX module is loaded, but not initialized */ > + TDX_MODULE_LOADED, > + /* TDX module is fully initialized */ > + TDX_MODULE_INITIALIZED, > + /* TDX module is shutdown due to error during initialization */ > + TDX_MODULE_SHUTDOWN, > +}; > + > /* BIOS must configure SEAMRR registers for all cores consistently */ > static u64 seamrr_base, seamrr_mask; > > static u32 tdx_keyid_start; > static u32 tdx_keyid_num; > > +static enum tdx_module_status_t tdx_module_status; > + > +/* Prevent concurrent attempts on TDX detection and initialization */ > +static DEFINE_MUTEX(tdx_module_lock); > + > static bool __seamrr_enabled(void) > { > return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS; > @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c) > detect_seam(c); > detect_tdx_keyids(c); > } > + > +static bool seamrr_enabled(void) > +{ > + /* > + * To detect any BIOS misconfiguration among cores, all logical > + * cpus must have been brought up at least once. This is true > + * unless 'maxcpus' kernel command line is used to limit the > + * number of cpus to be brought up during boot time. However > + * 'maxcpus' is basically an invalid operation mode due to the > + * MCE broadcast problem, and it should not be used on a TDX > + * capable machine. Just do paranoid check here and do not > + * report SEAMRR as enabled in this case. > + */ > + if (!cpumask_equal(&cpus_booted_once_mask, > + cpu_present_mask)) > + return false; > + > + return __seamrr_enabled(); > +} > + > +static bool tdx_keyid_sufficient(void) > +{ > + if (!cpumask_equal(&cpus_booted_once_mask, > + cpu_present_mask)) > + return false; I'd move this cpumask_equal() to a helper. > + /* > + * TDX requires at least two KeyIDs: one global KeyID to > + * protect the metadata of the TDX module and one or more > + * KeyIDs to run TD guests. > + */ > + return tdx_keyid_num >= 2; > +} > + > +static int __tdx_detect(void) > +{ > + /* The TDX module is not loaded if SEAMRR is disabled */ > + if (!seamrr_enabled()) { > + pr_info("SEAMRR not enabled.\n"); > + goto no_tdx_module; > + } Why even bother with the SEAMRR stuff? It sounded like you can "ping" the module with SEAMCALL. Why not just use that directly? > + /* > + * Also do not report the TDX module as loaded if there's > + * no enough TDX private KeyIDs to run any TD guests. > + */ > + if (!tdx_keyid_sufficient()) { > + pr_info("Number of TDX private KeyIDs too small: %u.\n", > + tdx_keyid_num); > + goto no_tdx_module; > + } > + > + /* Return -ENODEV until the TDX module is detected */ > +no_tdx_module: > + tdx_module_status = TDX_MODULE_NONE; > + return -ENODEV; > +} > + > +static int init_tdx_module(void) > +{ > + /* > + * Return -EFAULT until all steps of TDX module > + * initialization are done. > + */ > + return -EFAULT; > +} > + > +static void shutdown_tdx_module(void) > +{ > + /* TODO: Shut down the TDX module */ > + tdx_module_status = TDX_MODULE_SHUTDOWN; > +} > + > +static int __tdx_init(void) > +{ > + int ret; > + > + /* > + * Logical-cpu scope initialization requires calling one SEAMCALL > + * on all logical cpus enabled by BIOS. Shutting down the TDX > + * module also has such requirement. Further more, configuring > + * the key of the global KeyID requires calling one SEAMCALL for > + * each package. For simplicity, disable CPU hotplug in the whole > + * initialization process. > + * > + * It's perhaps better to check whether all BIOS-enabled cpus are > + * online before starting initializing, and return early if not. But you did some of this cpumask checking above. Right? > + * But none of 'possible', 'present' and 'online' CPU masks > + * represents BIOS-enabled cpus. For example, 'possible' mask is > + * impacted by 'nr_cpus' or 'possible_cpus' kernel command line. > + * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are > + * online. > + */ > + cpus_read_lock(); > + > + ret = init_tdx_module(); > + > + /* > + * Shut down the TDX module in case of any error during the > + * initialization process. It's meaningless to leave the TDX > + * module in any middle state of the initialization process. > + */ > + if (ret) > + shutdown_tdx_module(); > + > + cpus_read_unlock(); > + > + return ret; > +} > + > +/** > + * tdx_detect - Detect whether the TDX module has been loaded > + * > + * Detect whether the TDX module has been loaded and ready for > + * initialization. Only call this function when all cpus are > + * already in VMX operation. > + * > + * This function can be called in parallel by multiple callers. > + * > + * Return: > + * > + * * -0: The TDX module has been loaded and ready for > + * initialization. "-0", eh? > + * * -ENODEV: The TDX module is not loaded. > + * * -EPERM: CPU is not in VMX operation. > + * * -EFAULT: Other internal fatal errors. > + */ > +int tdx_detect(void) > +{ > + int ret; > + > + mutex_lock(&tdx_module_lock); > + > + switch (tdx_module_status) { > + case TDX_MODULE_UNKNOWN: > + ret = __tdx_detect(); > + break; > + case TDX_MODULE_NONE: > + ret = -ENODEV; > + break; > + case TDX_MODULE_LOADED: > + case TDX_MODULE_INITIALIZED: > + ret = 0; > + break; > + case TDX_MODULE_SHUTDOWN: > + ret = -EFAULT; > + break; > + default: > + WARN_ON(1); > + ret = -EFAULT; > + } > + > + mutex_unlock(&tdx_module_lock); > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdx_detect); > + > +/** > + * tdx_init - Initialize the TDX module > + * > + * Initialize the TDX module to make it ready to run TD guests. This > + * function should be called after tdx_detect() returns successful. > + * Only call this function when all cpus are online and are in VMX > + * operation. CPU hotplug is temporarily disabled internally. > + * > + * This function can be called in parallel by multiple callers. > + * > + * Return: > + * > + * * -0: The TDX module has been successfully initialized. > + * * -ENODEV: The TDX module is not loaded. > + * * -EPERM: The CPU which does SEAMCALL is not in VMX operation. > + * * -EFAULT: Other internal fatal errors. > + */ > +int tdx_init(void) > +{ > + int ret; > + > + mutex_lock(&tdx_module_lock); > + > + switch (tdx_module_status) { > + case TDX_MODULE_NONE: > + ret = -ENODEV; > + break; > + case TDX_MODULE_LOADED: > + ret = __tdx_init(); > + break; > + case TDX_MODULE_INITIALIZED: > + ret = 0; > + break; > + default: > + ret = -EFAULT; > + break; > + } > + mutex_unlock(&tdx_module_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(tdx_init); Why does this need both a tdx_detect() and a tdx_init()? Shouldn't the interface from outside just be "get TDX up and running, please?"