Re: Q. about KVM and CPU hotplug

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Tue, 30 Nov 2021 10:28:44 +0100

On 11/30/21 09:27, Tian, Kevin wrote:
		r = kvm_arch_hardware_enable();

		if (r) {
			cpumask_clear_cpu(cpu, cpus_hardware_enabled);
			atomic_inc(&hardware_enable_failed);
			pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
		}
	}

Upon error hardware_enable_failed is incremented. However this variable
is checked only in hardware_enable_all() called when the 1st VM is called.

This implies that KVM may be left in a state where it doesn't know a CPU
not ready to host VMX operations.

Then I'm curious what will happen if a vCPU is scheduled to this CPU. Does
KVM indirectly catch it (e.g. vmenter fail) and return a deterministic error
to Qemu at some point or may it lead to undefined behavior? And is there
any method to prevent vCPU thread from being scheduled to the CPU?

It should fail the first vmptrld instruction.  It will result in a few 
WARN_ONCE and pr_warn_ratelimited (see vmx_insn_failed).  For VMX this 
should be a pretty bad firmware bug, and it has never been reported. 
KVM did find some undocumented errata but not this one!

I don't think there's any fix other than pinning userspace.  The WARNs 
can be eliminated by calling KVM_BUG_ON in the sched_in notifier, plus 
checking if the VM is bugged before entering the guest or doing a 
VMREAD/VMWRITE (usually the check is done only in a ioctl).  But some 
refactoring is probably needed to make the code more robust in general.

Paolo

By design the current generation of TDX doesn't support CPU hotplug. 
Only boot-time CPUs can be initialized for TDX (and must be done en 
masse in one breath). Attempting to do seamcalls on a hotplugged CPU
simply fails, thus it potentially affects any trusted domain in case its
vCPUs are scheduled to the plugged CPU.