Re: [PATCH 3/3] KVM: VMX: Initialize TDX during KVM module load

"Huang, Kai" <kai.huang@xxxxxxxxx> · Thu, 31 Oct 2024 11:17:10 +0000

On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> On Tue, Oct 29, 2024, Kai Huang wrote:
> > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > index f9dddb8cb466..fec803aff7ad 100644
> > --- a/arch/x86/kvm/Makefile
> > +++ b/arch/x86/kvm/Makefile
> > @@ -20,6 +20,7 @@ kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
> >  
> >  kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
> >  kvm-intel-$(CONFIG_KVM_HYPERV)	+= vmx/hyperv.o vmx/hyperv_evmcs.o
> > +kvm-intel-$(CONFIG_INTEL_TDX_HOST)	+= vmx/tdx.o
> 
> IMO, INTEL_TDX_HOST should be a KVM Kconfig, e.g. KVM_INTEL_TDX.  Forcing the user
> to bounce between KVM's menu and the generic menu to enable KVM support for TDX is
> kludgy.  Having INTEL_TDX_HOST exist before KVM support came along made sense, as
> it allowed compile-testing a bunch of code, but I don't think it should be the end
> state.
> 
> If others disagree, then we should adjust KVM_AMD_SEV in the opposite direction,
> because doing different things for SEV vs. TDX is confusing and messy.

+ Dave (and Dan for TDX Connect).

Agree SEV/TDX should be in similar way.  But also I find SEV has a dependency on
CRYPTO_DEV_SP_PSP, so perhaps it also reasonable to make an additional
KVM_INTEL_TDX and make it depend on INTEL_TDX_HOST?

We could remove INTEL_TDX_HOST but only keep KVM_INTEL_TDX.  But in the long
term, more kernel components will need to add TDX support (e.g., for TDX
Connect).  I think the question is whether we can safely disable TDX code in ALL
kernel components when KVM_INTEL_TDX is not enabled.

If the answer is yes (seems correct to me, because it seems meaningless to
enable TDX code in _ANY_ kernel components when it's even possible to run TDX 
guest), then I think we can just change the current INTEL_TDX_HOST to
KVM_INTEL_TDX and put it in arch/x86/kvm/Kconfig.

Hi Dave, Dan,

Do you have any comments?

> 
> >  kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
> >  
> > diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> > index 433ecbd90905..053294939eb1 100644
> > --- a/arch/x86/kvm/vmx/main.c
> > +++ b/arch/x86/kvm/vmx/main.c
> > @@ -6,6 +6,7 @@
> >  #include "nested.h"
> >  #include "pmu.h"
> >  #include "posted_intr.h"
> > +#include "tdx.h"
> >  
> >  #define VMX_REQUIRED_APICV_INHIBITS				\
> >  	(BIT(APICV_INHIBIT_REASON_DISABLED) |			\
> > @@ -170,6 +171,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata = {
> >  static void vt_exit(void)
> >  {
> >  	kvm_exit();
> > +	tdx_cleanup();
> >  	vmx_exit();
> >  }
> >  module_exit(vt_exit);
> > @@ -182,6 +184,9 @@ static int __init vt_init(void)
> >  	if (r)
> >  		return r;
> >  
> > +	/* tdx_init() has been taken */
> > +	tdx_bringup();
> 
> tdx_module_init()?  And honestly, even though Linux doesn't currently support
> unloading the TDX module, I think tdx_module_exit() is a perfectly fine name,
> because not being able to unload the TDX module and reclaim all of that memory
> is a flaw that should be addressed at some point.

tdx_module_init()/exit() also work for me.

Or is vt_tdx_init()/exit() better?  We can rename vmx_init()/exit() to
vt_vmx_init()/exit() if needed.

> > +static enum cpuhp_state tdx_cpuhp_state;
> > +
> > +static int tdx_online_cpu(unsigned int cpu)
> > +{
> > +	unsigned long flags;
> > +	int r;
> > +
> > +	/* Sanity check CPU is already in post-VMXON */
> > +	WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE));
> > +
> > +	/* tdx_cpu_enable() must be called with IRQ disabled */
> 
> I don't find this comment helpfu.  If it explained _why_ tdx_cpu_enable() requires
> IRQs to be disabled, then I'd feel differently, but as is, IMO it doesn't add value.

I'll remove the comment.

> 
> > +	local_irq_save(flags);
> > +	r = tdx_cpu_enable();
> > +	local_irq_restore(flags);
> > +
> > +	return r;
> > +}
> > +
> 
> ...
> 
> > +static int __init __do_tdx_bringup(void)
> > +{
> > +	int r;
> > +
> > +	/*
> > +	 * TDX-specific cpuhp callback to call tdx_cpu_enable() on all
> > +	 * online CPUs before calling tdx_enable(), and on any new
> > +	 * going-online CPU to make sure it is ready for TDX guest.
> > +	 */
> > +	r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
> > +					 "kvm/cpu/tdx:online",
> > +					 tdx_online_cpu, NULL);
> > +	if (r < 0)
> > +		return r;
> > +
> > +	tdx_cpuhp_state = r;
> > +
> > +	/* tdx_enable() must be called with cpus_read_lock() */
> 
> This comment is even less valuable, IMO.

Will remove.

> 
> > +	r = tdx_enable();
> > +	if (r)
> > +		__do_tdx_cleanup();
> > +
> > +	return r;
> > +}
> > +
> > 

[...]

> > +void __init tdx_bringup(void)
> > +{
> > +	enable_tdx = enable_tdx && !__tdx_bringup();
> 
> Ah.  I don't love this approach because it mixes "failure" due to an unsupported
> configuration, with failure due to unexpected issues.  E.g. if enabling virtualization
> fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.

Thanks for the comments.

I see your point.  However for "enabling virtualization failure" kvm_init() will
also try to do (default behaviour), so if it fails it will result in module
loading failure eventually.  So while I guess it would be slightly better to
make module loading fail if "enabling virtualization fails" in TDX, it is a nit
issue to me.

I think "enabling virtualization failure" is the only "unexpected issue" that
should result in module loading failure.  For any other TDX-specific
initialization failure (e.g., any memory allocation in future patches) it's
better to only disable TDX.

So I can change to "make loading KVM-the-module fail if enabling virtualization
fails in TDX", but I want to confirm this is what you want?