Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

"Huang, Kai" <kai.huang@xxxxxxxxx> · Tue, 23 Apr 2024 22:59:43 +0000

On Tue, 2024-04-23 at 08:15 -0700, Sean Christopherson wrote:
> On Tue, Apr 23, 2024, Kai Huang wrote:
> > On Tue, 2024-04-23 at 13:34 +1200, Kai Huang wrote:
> > > > 
> > > > > > And the intent isn't to catch every possible problem.  As with many sanity checks,
> > > > > > the intent is to detect the most likely failure mode to make triaging and debugging
> > > > > > issues a bit easier.
> > > > > 
> > > > > The SEAMCALL will literally return a unique error code to indicate CPU
> > > > > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done.  I think the
> > > > > error code is already clear to pinpoint the problem (due to these pre-
> > > > > SEAMCALL-condition not being met).
> > > > 
> > > > No, SEAMCALL #UDs if the CPU isn't post-VMXON.  I.e. the CPU doesn't make it to
> > > > the TDX Module to provide a unique error code, all KVM will see is a #UD.
> > > 
> > > #UD is handled by the SEAMCALL assembly code.  Please see TDX_MODULE_CALL
> > > assembly macro:
> 
> Right, but that doesn't say why the #UD occurred.  The macro dresses it up in
> TDX_SW_ERROR so that KVM only needs a single parser, but at the end of the day
> KVM is still only going to see that SEAMCALL hit a #UD.

Right.  But is there any problem here?  I thought the point was we can
just use the error code to tell what went wrong.

> 
> > > > There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely
> > > > no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking
> > > > tdx_enable().  I suspect they got that way because of KVM's unnecessarily complex
> > > > code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough
> > > > to also do TDH_SYS_LP_INIT, so why do two IPIs?
> > > 
> > > The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX
> > > module initialization.  
> > > 
> > > Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the
> > > platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before
> > > any other SEAMCALLs.  This didn't quite work with (kernel software) CPU
> > > hotplug, and it had problem dealing with things like SMT disable
> > > mitigation:
> > > 
> > > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f
> > > 
> > > So the x86 maintainers requested to change this.  The original proposal
> > > was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT:
> > > 
> > > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469
> > > 
> > > But somehow it wasn't feasible, and the result was we relaxed to allow
> > > TDH.SYS.LP.INIT to be called after module initialization.
> > > 
> > > So we need a separate tdx_cpu_enable() for that.
> 
> No, you don't, at least not given the TDX patches I'm looking at.  Allowing
> TDH.SYS.LP.INIT after module initialization makes sense because otherwise the
> kernel would need to online all possible CPUs before initializing TDX.  But that
> doesn't mean that the kernel needs to, or should, punt TDH.SYS.LP.INIT to KVM.
> 
> AFAICT, KVM is NOT doing TDH.SYS.LP.INIT when a CPU is onlined, only when KVM
> is loaded, which means that tdx_enable() can process all online CPUs just as
> easily as KVM.

Hmm.. I assumed kvm_online_cpu() will do VMXON + tdx_cpu_enable().

> 
> Presumably that approach relies on something blocking onlining CPUs when TDX is
> active.  And if that's not the case, the proposed patches are buggy.

The current patch ([PATCH 023/130] KVM: TDX: Initialize the TDX module
when loading the KVM intel kernel module) indeed is buggy, but I don't
quite follow why we need to block onlining CPU  when TDX is active?

There's no hard things that prevent us to do so.  KVM just need to do
VMXON + tdx_cpu_enable() inside kvm_online_cpu().

> 
> > Btw, the ideal (or probably the final) plan is to handle tdx_cpu_enable()
> > in TDX's own CPU hotplug callback in the core-kernel and hide it from all
> > other in-kernel TDX users.  
> > 
> > Specifically:
> > 
> > 1) that callback, e.g., tdx_online_cpu() will be placed _before_ any in-
> > kernel TDX users like KVM's callback.
> > 2) In tdx_online_cpu(), we do VMXON + tdx_cpu_enable() + VMXOFF, and
> > return error in case of any error to prevent that cpu from going online.
> > 
> > That makes sure that, if TDX is supported by the platform, we basically
> > guarantees all online CPUs are ready to issue SEAMCALL (of course, the in-
> > kernel TDX user still needs to do VMXON for it, but that's TDX user's
> > responsibility).
> > 
> > But that obviously needs to move VMXON to the core-kernel.
> 
> It doesn't strictly have to be core kernel per se, just in code that sits below
> KVM, e.g. in a seperate module called VAC[*] ;-)
> 
> [*] https://lore.kernel.org/all/ZW6FRBnOwYV-UCkY@xxxxxxxxxx

Could you elaborate why vac.ko is necessary?

Being a module natually we will need to handle module init and exit.  But
TDX cannot be disabled and re-enabled after initialization, so in general
the vac.ko doesn't quite fit for TDX.

And I am not sure what's the fundamental difference between managing TDX
module in a module vs in the core-kernel from KVM's perspective.

> 
> > Currently, export tdx_cpu_enable() as a separate API and require KVM to
> > call it explicitly is a temporary solution.
> > 
> > That being said, we could do tdx_cpu_enable() inside tdx_enable(), but I
> > don't see it's a better idea.
> 
> It simplifies the API surface for enabling TDX and eliminates an export.

I was surprised to see we want to prevent onlining CPU when TDX is active.