On Tue, Jan 11, 2022 at 12:46:52AM +0000, Sean Christopherson wrote: >On Mon, Dec 27, 2021, Chao Gao wrote: >> At init time, KVM does compatibility checks to ensure that all online >> CPUs support hardware virtualization and a common set of features. But >> KVM uses hotplugged CPUs without such compatibility checks. On Intel >> CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or >> vmentry failure if the hotplugged CPU doesn't meet minimal feature >> requirements. >> >> Do compatibility checks when onlining a CPU. If any VM is running, >> KVM hotplug callback returns an error to abort onlining incompatible >> CPUs. >> >> But if no VM is running, onlining incompatible CPUs is allowed. Instead, >> KVM is prohibited from creating VMs similar to the policy for init-time >> compatibility checks. > >... > >> --- >> virt/kvm/kvm_main.c | 36 ++++++++++++++++++++++++++++++++++-- >> 1 file changed, 34 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index c1054604d1e8..0ff80076d48d 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -106,6 +106,8 @@ LIST_HEAD(vm_list); >> static cpumask_var_t cpus_hardware_enabled; >> static int kvm_usage_count; >> static atomic_t hardware_enable_failed; >> +/* Set if hardware becomes incompatible after CPU hotplug */ >> +static bool hardware_incompatible; >> >> static struct kmem_cache *kvm_vcpu_cache; >> >> @@ -4855,20 +4857,32 @@ static void hardware_enable_nolock(void *junk) >> >> static int kvm_online_cpu(unsigned int cpu) >> { >> - int ret = 0; >> + int ret; >> >> + ret = kvm_arch_check_processor_compat(); >> raw_spin_lock(&kvm_count_lock); >> /* >> * Abort the CPU online process if hardware virtualization cannot >> * be enabled. Otherwise running VMs would encounter unrecoverable >> * errors when scheduled to this CPU. >> */ >> - if (kvm_usage_count) { >> + if (!ret && kvm_usage_count) { >> hardware_enable_nolock(NULL); >> if (atomic_read(&hardware_enable_failed)) { >> ret = -EIO; >> pr_info("kvm: abort onlining CPU%d", cpu); >> } >> + } else if (ret && !kvm_usage_count) { >> + /* >> + * Continue onlining an incompatible CPU if no VM is >> + * running. KVM should reject creating any VM after this >> + * point. Then this CPU can be still used to run non-VM >> + * workload. >> + */ >> + ret = 0; >> + hardware_incompatible = true; > >This has a fairly big flaw in that it prevents KVM from creating VMs even if the >offending CPU is offlined. That seems like a very reasonable thing to do, e.g. >admin sees that hotplugging a CPU broke KVM and removes the CPU to remedy the >problem. And if KVM is built-in, reloading KVM to wipe hardware_incompatible >after offlining the CPU isn't an option. Ideally, yes, creation VMs should be allowed after offending CPUs are offlined. But the problem is kind of foundamental: After kernel tries to online a CPU without VMX, boot_cpu_has(X86_FEATURE_VMX) returns false. So, the current behavior is reloading KVM would fail if kernel *tried* to bring up a CPU without VMX. So, it looks to me that boot_cpu_has() doesn't do feature re-evalution either. Given that, I doubt the value of making KVM able to create VM in this case. > >To make this approach work, I think kvm_offline_cpu() would have to reevaluate >hardware_incompatible if the flag is set. > >And should there be a KVM module param to let the admin opt in/out of this >behavior? E.g. if the primary use case for a system is to run VMs, disabling >KVM just to online a CPU isn't very helpful. > >That said, I'm not convinced that continuing with the hotplug in this scenario >is ever the right thing to do. Either the CPU being hotplugged really is a different >CPU, or it's literally broken. In both cases, odds are very, very good that running >on the dodgy CPU will hose the kernel sooner or later, i.e. KVM's compatibility checks >are just the canary in the coal mine. Ok. Then here are two options: 1. KVM always prevents incompatible CPUs from being brought up regardless of running VMs 2. make "disabling KVM on incompatible CPUs" an opt-in feature. Which one do you think is better? And as said above, even with option 1, KVM reloading would fail due to boot_cpu_has(X86_FEATURE_VMX). I suppose it isn't necessary to be fixed in this series. > >TDX is a different beast as (a) that's purely a security restriction and (b) anyone >trying to run TDX guests darn well better know that TDX doesn't allow hotplug. >In other words, if TDX gets disabled due to hotplug, either someone majorly screwed >up and is going to be unhappy no matter what, or there's no intention of using TDX >and it's a complete don't care. > >> + pr_info("kvm: prohibit VM creation due to incompatible CPU%d", > >pr_info() is a bit weak, this should be at least pr_warn() and maybe even pr_err(). > >> + cpu); > >Eh, I'd omit the newline and let that poke out. Will do. > >> } >> raw_spin_unlock(&kvm_count_lock); >> return ret; >> @@ -4913,8 +4927,24 @@ static int hardware_enable_all(void) >> { >> int r = 0; >> >> + /* >> + * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu() >> + * is called. on_each_cpu() between them includes the CPU. As a result, >> + * hardware_enable_nolock() may get invoked before kvm_online_cpu(). >> + * This would enable hardware virtualization on that cpu without >> + * compatibility checks, which can potentially crash system or break >> + * running VMs. >> + * >> + * Disable CPU hotplug to prevent this case from happening. >> + */ >> + cpus_read_lock(); >> raw_spin_lock(&kvm_count_lock); >> >> + if (hardware_incompatible) { > >Another error message would likely be helpful here. Even better would be if KVM >could provide some way for userspace to query which CPU(s) is bad. If option 1 is chosen, this check will be removed. For option 2, will add an error message. And how about a debugfs tunable to provide the list of bad CPUs?