Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 18/04/2024 11:35 am, Sean Christopherson wrote:
On Thu, Apr 18, 2024, Kai Huang wrote:
On 18/04/2024 2:40 am, Sean Christopherson wrote:
This way, architectures that aren't saddled with out-of-tree hypervisors can do
the dead simple thing of enabling hardware during their initialization sequence,
and the TDX code is much more sane, e.g. invoke kvm_x86_enable_virtualization()
during late_hardware_setup(), and kvm_x86_disable_virtualization() during module
exit (presumably).

Fine to me, given I am not familiar with other ARCHs, assuming always enable
virtualization when KVM present is fine to them. :-)

Two questions below:

+int kvm_x86_enable_virtualization(void)
+	int r;
+	guard(mutex)(&vendor_module_lock);

It's a little bit odd to take the vendor_module_lock mutex.

It is called by kvm_arch_init_vm(), so more reasonablly we should still use

I think this should take an x86-specific lock, since it's guarding x86-specific

OK.  This makes sense.

And vendor_module_lock fits the bill perfectly.  Well, except for the
name, and I definitely have no objection to renaming it.

No opinion on renaming. Personally I wouldn't bother to rename. We can add a comment in kvm_x86_enable_virtualization() to explain. Perhaps in the future we just want to change to always enable virtualization for x86 too..

Also, if we invoke kvm_x86_enable_virtualization() from
kvm_x86_ops->late_hardware_setup(), then IIUC we will deadlock here because
kvm_x86_vendor_init() already takes the vendor_module_lock?

Ah, yeah.  Oh, duh.  I think the reason I didn't initially suggest late_hardware_setup()
is that I was assuming/hoping TDX setup could be done after kvm_x86_vendor_exit().
E.g. in vt_init() or whatever it gets called:

	r = kvm_x86_vendor_exit(...);
	if (r)
		return r;

	if (enable_tdx) {
		r = tdx_blah_blah_blah();
		if (r)
			goto vendor_exit;

I assume the reason you introduced the late_hardware_setup() is purely because you want to do:




Anyway, we can also do 'enable_tdx' outside of kvm_x86_vendor_init() as above, given it cannot be done in hardware_setup() anyway.

If we do 'enable_tdx' in late_hardware_setup(), we will need a kvm_x86_enable_virtualization_nolock(), but that's also not a problem to me.

So which way do you prefer?

Btw, with kvm_x86_virtualization_enable(), it seems the compatibility check is lost, which I assume is OK?

Btw2, currently tdx_enable() requires cpus_read_lock() must be called prior. If we do unconditional tdx_cpu_enable() in vt_hardware_enable(), then with your proposal IIUC there's no such requirement anymore, because no task will be scheduled to the new CPU before it reaches CPUHP_AP_ACTIVE. But now calling cpus_read_lock()/unlock() around tdx_enable() also acceptable to me.


+int kvm_enable_virtualization(void)
+	int r;
+	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+			      kvm_online_cpu, kvm_offline_cpu);
+	if (r)
+		return r;
+	register_syscore_ops(&kvm_syscore_ops);
+	/*
+	 * Manually undo virtualization enabling if the system is going down.
+	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
+	 * possible for an in-flight module load to enable virtualization
+	 * after syscore_shutdown() is called, i.e. without kvm_shutdown()
+	 * being invoked.  Note, this relies on system_state being set _before_
+	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
+	 * or this CPU observes the impedning shutdown.  Which is why KVM uses
+	 * a syscore ops hook instead of registering a dedicated reboot
+	 * notifier (the latter runs before system_state is updated).
+	 */
+	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
+	    system_state == SYSTEM_RESTART) {
+		unregister_syscore_ops(&kvm_syscore_ops);
+		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+		return -EBUSY;
+	}

Aren't we also supposed to do:

	on_each_cpu(__kvm_enable_virtualization, NULL, 1);


No, cpuhp_setup_state() invokes the callback, kvm_online_cpu(), on each CPU.
I.e. KVM has been doing things the hard way by using cpuhp_setup_state_nocalls().
That's part of the complexity I would like to get rid of.

Ah, right :-)

Btw, why couldn't we do the 'system_state' check at the very beginning of this function?

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux