On Mon, 2022-08-15 at 22:35 +0000, Sean Christopherson wrote: > On Fri, Aug 12, 2022, Huang, Kai wrote: > > On Thu, 2022-08-11 at 17:39 +0000, Sean Christopherson wrote: > > > I've been poking at the "hardware enable" code this week for other reasons, and > > > have come to the conclusion that the current implementation is a mess. > > > > Thanks for the lengthy reply :) > > > > First of all, to clarify, I guess by "current implementation" you mean the > > current upstream KVM code, but not this particular patch? :) > > Yeah, upstream code. > > > > Of course, that path is broken for other reasons too, e.g. needs to prevent CPUs > > > from going on/off-line when KVM is enabling hardware. > > > https://lore.kernel.org/all/20220216031528.92558-7-chao.gao@xxxxxxxxx > > > > If I read correctly, the problem described in above link seems only to be true > > after we move CPUHP_AP_KVM_STARTING from STARTING section to ONLINE section, but > > this hasn't been done yet in the current upstream KVM. Currently, > > CPUHP_AP_KVM_STARTING is still in STARTING section so it is guaranteed it has > > been executed before start_secondary sets itself to online cpu mask. > > The lurking issue is that for_each_online_cpu() can against hotplug, i.e. every > instance of for_each_online_cpu() in KVM is buggy (at least on the x86 side, I > can't tell at a glance whether or not arm pKVM's usage is safe). > > https://lore.kernel.org/all/87bl20aa72.ffs@tglx Yes agreed. for_each_online_cpu() can race with CPU hotplug. But the fact is looks there are many places using for_each_online_cpus() w/o holding cpus_read_lock(). :) > > > Btw I saw v4 of Chao's patchset was sent Feb this year. It seems that series > > indeed improved CPU compatibility check and hotplug handling. Any reason that > > series wasn't merged? > > AFAIK it was just a lack of reviews/acks for the non-KVM patches. > > > Also agreed that kvm_lock should be used. But I am not sure whether > > cpus_read_lock() is needed (whether CPU hotplug should be prevented). In > > current KVM, we don't do CPU compatibility check for hotplug CPU anyway, so when > > KVM does CPU compatibility check using for_each_online_cpu(), if CPU hotplug > > (hot-removal) happens, the worst case is we lose compatibility check on that > > CPU. > > > > Or perhaps I am missing something? > > On a hot-add of an incompatible CPU, KVM would potentially skip the compatibility > check and try to enable hardware on an incompatible/broken CPU. To resolve this, we need to do compatibility check before actually enabling hardware on each cpu, as Chao's series did. I don't see using cpus_read_lock() alone can actually fix anything. > > Another possible bug is the checking of hv_get_vp_assist_page(); hot-adding a > CPU that failed to allocate the VP assist page while vmx_init() is checking online > CPUs could result in a NULL pointer deref due to KVM not rejecting the CPU as it > should. > So we need Chao's series to fix those problems: 1) Do compatibility check before actually enable the hardware for each cpu; 2) allow CPU hotplug to fail; 3) Hold cpus_read_lock() when needed. -- Thanks, -Kai