Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 22, 2024, Kai Huang wrote:
> On Mon, 2024-04-22 at 09:54 -0700, Sean Christopherson wrote:
> > On Mon, Apr 22, 2024, Kai Huang wrote:
> > > On Fri, 2024-04-19 at 10:23 -0700, Sean Christopherson wrote:
> > > > On Fri, Apr 19, 2024, Kai Huang wrote:
> > > > And tdx_enable() should also do its best to verify that the caller is post-VMXON:
> > > > 
> > > > 	if (WARN_ON_ONCE(!(__read_cr4() & X86_CR4_VMXE)))
> > > > 		return -EINVAL;
> > > 
> > > This won't be helpful, or at least isn't sufficient.
> > > 
> > > tdx_enable() can SEAMCALLs on all online CPUs, so checking "the caller is
> > > post-VMXON" isn't enough.  It needs "checking all online CPUs are in post-
> > > VMXON with tdx_cpu_enable() having been done".
> > 
> > I'm suggesting adding it in the responding code that does that actual SEAMCALL.
> 
> The thing is to check you will need to do additional things like making
> sure no scheduling would happen during "check + make SEAMCALL".  Doesn't
> seem worth to do for me.
> 
> The intent of tdx_enable() was the caller should make sure no new CPU
> should come online (well this can be relaxed if we move
> cpuhp_setup_state() to hardware_enable_all()), and all existing online
> CPUs are in post-VMXON with tdx_cpu_enable() been done.
> 
> I think, if we ever need any check, the latter seems to be more
> reasonable.
> 
> But if we allow new CPU to become online during tdx_enable() (with your
> enhancement to the hardware enabling), then I don't know how to make such
> check at the beginning of tdx_enable(), because do 
> 
> 	on_each_cpu(check_seamcall_precondition, NULL, 1);
> 
> cannot catch any new CPU during tdx_enable().

Doh, we're talking past each other, because my initial suggestion was half-baked.

When I initially said "tdx_enable()", I didn't intend to literally mean just the
CPU that calls tdx_enable().  What I was trying to say was, when doing per-CPU
things when enabling TDX, sanity check that the current CPU has CR4.VMXE=1 before
doing a SEAMCALL.

Ah, and now I see where the disconnect is.  I was assuming tdx_enable() also did
TDH_SYS_LP_INIT, but that's in a separate chunk of code that's manually invoked
by KVM.  More below.

> > And the intent isn't to catch every possible problem.  As with many sanity checks,
> > the intent is to detect the most likely failure mode to make triaging and debugging
> > issues a bit easier.
> 
> The SEAMCALL will literally return a unique error code to indicate CPU
> isn't in post-VMXON, or tdx_cpu_enable() hasn't been done.  I think the
> error code is already clear to pinpoint the problem (due to these pre-
> SEAMCALL-condition not being met).

No, SEAMCALL #UDs if the CPU isn't post-VMXON.  I.e. the CPU doesn't make it to
the TDX Module to provide a unique error code, all KVM will see is a #UD.

> > > Btw, I noticed there's another problem, that is currently tdx_cpu_enable()
> > > actually requires IRQ being disabled.  Again it was implemented based on
> > > it would be invoked via both on_each_cpu() and kvm_online_cpu().
> > > 
> > > It also also implemented with consideration that it could be called by
> > > multiple in-kernel TDX users in parallel via both SMP call and in normal
> > > context, so it was implemented to simply request the caller to make sure
> > > it is called with IRQ disabled so it can be IRQ safe  (it uses a percpu
> > > variable to track whether TDH.SYS.LP.INIT has been done for local cpu
> > > similar to the hardware_enabled percpu variable).
> > 
> > Is this is an actual problem, or is it just something that would need to be
> > updated in the TDX code to handle the change in direction?
> 
> For now this isn't, because KVM is the solo user, and in KVM
> hardware_enable_all() and kvm_online_cpu() uses kvm_lock mutex to make
> hardware_enable_nolock() IPI safe.
> 
> I am not sure how TDX/SEAMCALL will be used in TDX Connect.
> 
> However I needed to consider KVM as a user, so I decided to just make it
> must be called with IRQ disabled so I could know it is IRQ safe.
> 
> Back to the current tdx_enable() and tdx_cpu_enable(), my personal
> preference is, of course, to keep the existing way, that is:
> 
> During module load:
> 
> 	cpus_read_lock();
> 	tdx_enable();
> 	cpus_read_unlock();
> 
> and in kvm_online_cpu():
> 
> 	local_irq_save();
> 	tdx_cpu_enable();
> 	local_irq_restore();
> 
> But given KVM is the solo user now, I am also fine to change if you
> believe this is not acceptable.

Looking more closely at the code, tdx_enable() needs to be called under
cpu_hotplug_lock to prevent *unplug*, i.e. to prevent the last CPU on a package
from being offlined.  I.e. that part's not option.

And the root of the problem/confusion is that the APIs provided by the core kernel
are weird, which is really just a polite way of saying they are awful :-)

There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely
no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking
tdx_enable().  I suspect they got that way because of KVM's unnecessarily complex
code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough
to also do TDH_SYS_LP_INIT, so why do two IPIs?

But just because KVM "owns" VMXON doesn't mean the core kernel code should punt
TDX to KVM too.  If KVM relies on the cpuhp code to ensure all online CPUs are
post-VMXON, then the TDX shapes up nicely and provides a single API to enable
TDX.  And then my CR4.VMXE=1 sanity check makes a _lot_ more sense.

Relative to some random version of the TDX patches, this is what I'm thinking:

---
 arch/x86/kvm/vmx/tdx.c      | 46 +++----------------
 arch/x86/virt/vmx/tdx/tdx.c | 89 ++++++++++++++++++-------------------
 2 files changed, 49 insertions(+), 86 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a1d3ae09091c..137d08da43c3 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3322,38 +3322,8 @@ bool tdx_is_vm_type_supported(unsigned long type)
 	return type == KVM_X86_TDX_VM;
 }
 
-struct tdx_enabled {
-	cpumask_var_t enabled;
-	atomic_t err;
-};
-
-static void __init tdx_on(void *_enable)
-{
-	struct tdx_enabled *enable = _enable;
-	int r;
-
-	r = vmx_hardware_enable();
-	if (!r) {
-		cpumask_set_cpu(smp_processor_id(), enable->enabled);
-		r = tdx_cpu_enable();
-	}
-	if (r)
-		atomic_set(&enable->err, r);
-}
-
-static void __init vmx_off(void *_enabled)
-{
-	cpumask_var_t *enabled = (cpumask_var_t *)_enabled;
-
-	if (cpumask_test_cpu(smp_processor_id(), *enabled))
-		vmx_hardware_disable();
-}
-
 int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 {
-	struct tdx_enabled enable = {
-		.err = ATOMIC_INIT(0),
-	};
 	int max_pkgs;
 	int r = 0;
 	int i;
@@ -3409,17 +3379,11 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
 		goto out;
 	}
 
-	/* tdx_enable() in tdx_module_setup() requires cpus lock. */
-	cpus_read_lock();
-	on_each_cpu(tdx_on, &enable, true); /* TDX requires vmxon. */
-	r = atomic_read(&enable.err);
-	if (!r)
-		r = tdx_module_setup();
-	else
-		r = -EIO;
-	on_each_cpu(vmx_off, &enable.enabled, true);
-	cpus_read_unlock();
-	free_cpumask_var(enable.enabled);
+	r = tdx_enable();
+	if (r)
+		goto out;
+
+	r = tdx_module_setup();
 	if (r)
 		goto out;
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index d2b8f079a637..19897f736c47 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -139,49 +139,6 @@ static int try_init_module_global(void)
 	return sysinit_ret;
 }
 
-/**
- * tdx_cpu_enable - Enable TDX on local cpu
- *
- * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
- * global initialization SEAMCALL if not done) on local cpu to make this
- * cpu be ready to run any other SEAMCALLs.
- *
- * Always call this function via IPI function calls.
- *
- * Return 0 on success, otherwise errors.
- */
-int tdx_cpu_enable(void)
-{
-	struct tdx_module_args args = {};
-	int ret;
-
-	if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
-		return -ENODEV;
-
-	lockdep_assert_irqs_disabled();
-
-	if (__this_cpu_read(tdx_lp_initialized))
-		return 0;
-
-	/*
-	 * The TDX module global initialization is the very first step
-	 * to enable TDX.  Need to do it first (if hasn't been done)
-	 * before the per-cpu initialization.
-	 */
-	ret = try_init_module_global();
-	if (ret)
-		return ret;
-
-	ret = seamcall_prerr(TDH_SYS_LP_INIT, &args);
-	if (ret)
-		return ret;
-
-	__this_cpu_write(tdx_lp_initialized, true);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(tdx_cpu_enable);
-
 /*
  * Add a memory region as a TDX memory block.  The caller must make sure
  * all memory regions are added in address ascending order and don't
@@ -1201,6 +1158,43 @@ static int init_tdx_module(void)
 	goto out_put_tdxmem;
 }
 
+/**
+ * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
+ * global initialization SEAMCALL if not done) on local cpu to make this
+ * cpu be ready to run any other SEAMCALLs.
+ */
+static void tdx_cpu_enable(void *__err)
+{
+	struct tdx_module_args args = {};
+	atomic_t err = __err;
+	int ret;
+
+	if (__this_cpu_read(tdx_lp_initialized))
+		return;
+
+	if (WARN_ON_ONCE(!(__read_cr4() & X86_CR4_VMXE)))
+		goto failed;
+
+	/*
+	 * The TDX module global initialization is the very first step
+	 * to enable TDX.  Need to do it first (if hasn't been done)
+	 * before the per-cpu initialization.
+	 */
+	ret = try_init_module_global();
+	if (ret)
+		goto failed;
+
+	ret = seamcall_prerr(TDH_SYS_LP_INIT, &args);
+	if (ret)
+		goto failed;
+
+	__this_cpu_write(tdx_lp_initialized, true);
+	return;
+
+failed:
+	atomic_inc(err);
+}
+
 static int __tdx_enable(void)
 {
 	int ret;
@@ -1234,15 +1228,19 @@ static int __tdx_enable(void)
  */
 int tdx_enable(void)
 {
+	atomic_t err = ATOMIC_INIT(0);
 	int ret;
 
 	if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
 		return -ENODEV;
 
-	lockdep_assert_cpus_held();
-
+	cpus_read_lock();
 	mutex_lock(&tdx_module_lock);
 
+	on_each_cpu(tdx_cpu_enable, &err, true);
+	if (atomic_read(&err))
+		tdx_module_status = TDX_MODULE_ERROR;
+
 	switch (tdx_module_status) {
 	case TDX_MODULE_UNINITIALIZED:
 		ret = __tdx_enable();
@@ -1258,6 +1256,7 @@ int tdx_enable(void)
 	}
 
 	mutex_unlock(&tdx_module_lock);
+	cpus_read_unlock();
 
 	return ret;
 }

base-commit: fde917bc1af3e1a440ab0cb0d9364f8da25b9e17
-- 






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux