Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

"Huang, Kai" <kai.huang@xxxxxxxxx> · Fri, 26 Apr 2024 00:21:46 +0000

> 
> > > The important thing is that they're handled by _one_ entity.  What we have today
> > > is probably the worst setup; VMXON is handled by KVM, but TDX.SYS.LP.INIT is
> > > handled by core kernel (sort of).
> > 
> > I cannot argue against this :-)
> > 
> > But from this point of view, I cannot see difference between tdx_enable()
> > and tdx_cpu_enable(), because they both in core-kernel while depend on KVM
> > to handle VMXON.
> 
> My comments were made under the assumption that the code was NOT buggy, i.e. if
> KVM did NOT need to call tdx_cpu_enable() independent of tdx_enable().
> 
> That said, I do think it makes to have tdx_enable() call an private/inner version,
> e.g. __tdx_cpu_enable(), and then have KVM call a public version.  Alternatively,
> the kernel could register yet another cpuhp hook that runs after KVM's, i.e. does
> TDX.SYS.LP.INIT after KVM has done VMXON (if TDX has been enabled).

We will need to handle tdx_cpu_online() in "some cpuhp callback" anyway,
no matter whether tdx_enable() calls __tdx_cpu_enable() internally or not,
because now tdx_enable() can be done on a subset of cpus that the platform
has.

For the latter (after the "Alternatively" above), by "the kernel" do you
mean the core-kernel but not KVM?

E.g., you mean to register a cpuhp book _inside_ tdx_enable() after TDX is
initialized successfully?

That would have problem like when KVM is not present (e.g., KVM is
unloaded after it enables TDX), the cpuhp book won't work at all.

If we ever want a new TDX-specific cpuhp hook "at this stage", IMHO it's
better to have it done by KVM, i.e., it goes away when KVM is unloaded.

Logically, we have two approaches in terms of how to treat
tdx_cpu_enable():

1) We treat the two cases separately: calling tdx_cpu_enable() for all
online cpus, and calling it when a new CPU tries to go online in some
cpuhp hook.  And we only want to call tdx_cpu_enable() in cpuhp book when
tdx_enable() has done successfully.

That is: 

a) we always call tdx_cpu_enable() (or __tdx_cpu_enable()) inside
tdx_enable() as the first step, or,

b) let the caller (KVM) to make sure of tdx_cpu_enable() has been done for
all online cpus before calling tdx_enable().

Something like this:

	if (enable_tdx) {
		cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, kvm_online_cpu, 
			...);

		cpus_read_lock();
		on_each_cpu(tdx_cpu_enable, ...); /* or do it inside 
						   * in tdx_enable() */
		enable_tdx = tdx_enable();
		if (enable_tdx)
			cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
				tdx_online_cpu, ...);
		cpus_read_unlock();
	}

	static int tdx_online_cpu(unsigned int cpu)
	{
		unsigned long flags;
		int ret;

		if (!enable_tdx)
			return 0;

		local_irq_save(flags);
		ret = tdx_cpu_enable();
		local_irq_restore(flags);

		return ret;
	}

2) We treat tdx_cpu_enable() as a whole by viewing it as the first step to
run any TDX code (SEAMCALL) on any cpu, including the SEAMCALLs involved
in tdx_enable().

That is, we *unconditionally* call tdx_cpu_enable() for all online cpus,
and when a new CPU tries to go online.

This can be handled at once if we do tdx_cpu_enable() inside KVM's cpuhp
hook:

	static int vt_hardware_enable(unsigned int cpu)
	{
		vmx_hardware_enable();

		local_irq_save(flags);
		ret = tdx_cpu_enable();
		local_irq_restore(flags);

		/*
		 * -ENODEV means TDX is not supported by the platform
		 * (TDX not enabled by the hardware or module is
		 * not loaded) or the kernel isn't built with TDX.
		 *
		 * Allow CPU to go online as there's no way kernel
		 * could use TDX in this case.
		 *
		 * Other error codes means TDX is available but something
		 * went wrong.  Prevent this CPU to go online so that
		 * TDX may still work on other online CPUs.
		 */
		if (ret && ret != -ENODEV)
			return ret;

		return ret;
	}

So with your change to always enable virtualization when TDX is enabled
during module load, we can simply have:

	if (enable_tdx)
		cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, kvm_online_cpu, 
			...);

		cpus_read_lock();
		enable_tdx = tdx_enable();
		cpus_read_unlock();
	}

So despite the cpus_read_lock() around tdx_enable() is a little bit silly,
the logic is actually simpler IMHO.

(local_irq_save()/restore() around tdx_cpu_enable() is also silly but that
is a common problem to both above solution and can be changed
independently).

Also, as I mentioned that the final goal is to have a TDX-specific CPUHP
hook in the core-kernel _BEFORE_ any in-kernel TDX user (KVM) to make sure
all online CPUs are TDX-capable.  

When that happens, I can just move the code in vt_hardware_enable() to
tdx_online_cpu() and do additional VMXOFF inside it, with the assumption
that the in-kernel TDX users should manage VMXON/VMXOFF on their own. 
Then all TDX users can remove the handling of tdx_cpu_enable().