Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

"Huang, Kai" <kai.huang@xxxxxxxxx> · Tue, 23 Apr 2024 01:34:10 +0000

> 
> > > And the intent isn't to catch every possible problem.  As with many sanity checks,
> > > the intent is to detect the most likely failure mode to make triaging and debugging
> > > issues a bit easier.
> > 
> > The SEAMCALL will literally return a unique error code to indicate CPU
> > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done.  I think the
> > error code is already clear to pinpoint the problem (due to these pre-
> > SEAMCALL-condition not being met).
> 
> No, SEAMCALL #UDs if the CPU isn't post-VMXON.  I.e. the CPU doesn't make it to
> the TDX Module to provide a unique error code, all KVM will see is a #UD.

#UD is handled by the SEAMCALL assembly code.  Please see TDX_MODULE_CALL
assembly macro:

.Lseamcall_trap\@:
        /*
         * SEAMCALL caused #GP or #UD.  By reaching here RAX contains
         * the trap number.  Convert the trap number to the TDX error
         * code by setting TDX_SW_ERROR to the high 32-bits of RAX.
         *
         * Note cannot OR TDX_SW_ERROR directly to RAX as OR instruction
         * only accepts 32-bit immediate at most.
         */
        movq $TDX_SW_ERROR, %rdi
        orq  %rdi, %rax

	...

	_ASM_EXTABLE_FAULT(.Lseamcall\@, .Lseamcall_trap\@)
.endif  /* \host */

> 
> > > > Btw, I noticed there's another problem, that is currently tdx_cpu_enable()
> > > > actually requires IRQ being disabled.  Again it was implemented based on
> > > > it would be invoked via both on_each_cpu() and kvm_online_cpu().
> > > > 
> > > > It also also implemented with consideration that it could be called by
> > > > multiple in-kernel TDX users in parallel via both SMP call and in normal
> > > > context, so it was implemented to simply request the caller to make sure
> > > > it is called with IRQ disabled so it can be IRQ safe  (it uses a percpu
> > > > variable to track whether TDH.SYS.LP.INIT has been done for local cpu
> > > > similar to the hardware_enabled percpu variable).
> > > 
> > > Is this is an actual problem, or is it just something that would need to be
> > > updated in the TDX code to handle the change in direction?
> > 
> > For now this isn't, because KVM is the solo user, and in KVM
> > hardware_enable_all() and kvm_online_cpu() uses kvm_lock mutex to make
> > hardware_enable_nolock() IPI safe.
> > 
> > I am not sure how TDX/SEAMCALL will be used in TDX Connect.
> > 
> > However I needed to consider KVM as a user, so I decided to just make it
> > must be called with IRQ disabled so I could know it is IRQ safe.
> > 
> > Back to the current tdx_enable() and tdx_cpu_enable(), my personal
> > preference is, of course, to keep the existing way, that is:
> > 
> > During module load:
> > 
> > 	cpus_read_lock();
> > 	tdx_enable();
> > 	cpus_read_unlock();
> > 
> > and in kvm_online_cpu():
> > 
> > 	local_irq_save();
> > 	tdx_cpu_enable();
> > 	local_irq_restore();
> > 
> > But given KVM is the solo user now, I am also fine to change if you
> > believe this is not acceptable.
> 
> Looking more closely at the code, tdx_enable() needs to be called under
> cpu_hotplug_lock to prevent *unplug*, i.e. to prevent the last CPU on a package
> from being offlined.  I.e. that part's not option.

Yeah.  We can say that.  I almost forgot this :-)

> 
> And the root of the problem/confusion is that the APIs provided by the core kernel
> are weird, which is really just a polite way of saying they are awful :-)

Well, apologize for it :-)

> 
> There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely
> no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking
> tdx_enable().  I suspect they got that way because of KVM's unnecessarily complex
> code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough
> to also do TDH_SYS_LP_INIT, so why do two IPIs?

The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX
module initialization.  

Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the
platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before
any other SEAMCALLs.  This didn't quite work with (kernel software) CPU
hotplug, and it had problem dealing with things like SMT disable
mitigation:

https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f

So the x86 maintainers requested to change this.  The original proposal
was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT:

https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469

But somehow it wasn't feasible, and the result was we relaxed to allow
TDH.SYS.LP.INIT to be called after module initialization.

So we need a separate tdx_cpu_enable() for that.

> 
> But just because KVM "owns" VMXON doesn't mean the core kernel code should punt
> TDX to KVM too.  If KVM relies on the cpuhp code to ensure all online CPUs are
> post-VMXON, then the TDX shapes up nicely and provides a single API to enable
> TDX.  
> 

We could ask tdx_enable() to invoke tdx_cpu_enable() internally, but as
mentioned above we still to have the tdx_cpu_enable() as a separate API to
allow CPU hotplug to call it.

> And then my CR4.VMXE=1 sanity check makes a _lot_ more sense.

As replied above the current SEAMCALL assembly returns a unique error code
for that:

	#define TDX_SEAMCALL_UD		(TDX_SW_ERROR |
X86_TRAP_UD)