Sean Christopherson <seanjc@xxxxxxxxxx> writes: > On Fri, Aug 20, 2021, Vitaly Kuznetsov wrote: >> Iterating over set bits in 'vcpu_bitmap' should be faster than going >> through all vCPUs, especially when just a few bits are set. >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >> --- >> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++++---------------- >> 1 file changed, 32 insertions(+), 17 deletions(-) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 3e67c93ca403..0f873c5ed538 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -257,34 +257,49 @@ static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait) >> return true; >> } >> >> +static void kvm_make_vcpu_request(struct kvm *kvm, struct kvm_vcpu *vcpu, >> + unsigned int req, cpumask_var_t tmp) >> +{ >> + int cpu = vcpu->cpu; > > This reminds me, syzbot found a data race a while back[*] in kvm_vcpu_kick() > related to reading vcpu->cpu. That race is benign, but legitimate. I believe > this code has a similar race, and I'm not as confident that it's benign. > > If the target vCPU changes vcpu->cpu after it's read by this code, then the IPI > can sent to the wrong pCPU, e.g. this pCPU gets waylaid by an IRQ and the target > vCPU is migrated to a new pCPU. > > The TL;DR is that the race is benign because the target vCPU is still guaranteed > to see the request before entering the guest, even if the IPI goes to the wrong > pCPU. I believe the same holds true for KVM_REQUEST_WAIT, e.g. if the lockless > shadow PTE walk gets migrated to a new pCPU just before setting vcpu->mode to > READING_SHADOW_PAGE_TABLES, this code can use a stale "cpu" for __cpumask_set_cpu(). > The race is benign because the vCPU would have to enter READING_SHADOW_PAGE_TABLES > _after_ the SPTE modifications were made, as vcpu->cpu can't change while the vCPU > is reading SPTEs. The same logic holds true for the case where the vCPU is migrated > after the call to __cpumask_set_cpu(); the goal is to wait for the vCPU to return to > OUTSIDE_GUEST_MODE, which is guaranteed if the vCPU is migrated even if this path > doesn't wait for an ack from the _new_ pCPU. > > I'll send patches to fix the races later today, maybe they can be folded into > v2? Even though the races are benign, I think they're worth fixing, if only to > provide an opportunity to document why it's ok to send IPIs to the > wrong pCPU. You're blazingly fast as usual :-) I'll do v2 on top of your patches. > > [*] On an upstream kernel, but I don't think the bug report was posted to LKML. > >> + >> + kvm_make_request(req, vcpu); >> + >> + if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu)) >> + return; >> + >> + if (tmp != NULL && cpu != -1 && cpu != raw_smp_processor_id() && > > For large VMs, might be worth keeping get_cpu() in the caller in passing in @me? The only reason against was that I've tried keeping the newly introduced kvm_make_vcpu_request()'s interface nicer, like it can be reused some day. Will get back to get_cpu() in v2. > >> + kvm_request_needs_ipi(vcpu, req)) >> + __cpumask_set_cpu(cpu, tmp); >> +} >> + >> bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, >> struct kvm_vcpu *except, >> unsigned long *vcpu_bitmap, cpumask_var_t tmp) >> { >> - int i, cpu, me; >> + int i; >> struct kvm_vcpu *vcpu; >> bool called; >> >> - me = get_cpu(); >> - >> - kvm_for_each_vcpu(i, vcpu, kvm) { >> - if ((vcpu_bitmap && !test_bit(i, vcpu_bitmap)) || >> - vcpu == except) >> - continue; >> - >> - kvm_make_request(req, vcpu); >> - cpu = vcpu->cpu; >> - >> - if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu)) >> - continue; >> + preempt_disable(); >> >> - if (tmp != NULL && cpu != -1 && cpu != me && >> - kvm_request_needs_ipi(vcpu, req)) >> - __cpumask_set_cpu(cpu, tmp); >> + if (likely(vcpu_bitmap)) { > > I don't think this is actually "likely". kvm_make_all_cpus_request() is by far > the most common caller and does not pass in a vcpu_bitmap. Practically speaking > I highly don't the code organization will matter, but from a documentation > perspective it's wrong. Right, I was thinking more about two other users: IOAPIC and Hyper-V who call kvm_make_vcpus_request_mask() directly but I agree that kvm_make_all_cpus_request() is probably much more common. > >> + for_each_set_bit(i, vcpu_bitmap, KVM_MAX_VCPUS) { >> + vcpu = kvm_get_vcpu(kvm, i); >> + if (!vcpu || vcpu == except) >> + continue; >> + kvm_make_vcpu_request(kvm, vcpu, req, tmp); >> + } >> + } else { >> + kvm_for_each_vcpu(i, vcpu, kvm) { >> + if (vcpu == except) >> + continue; >> + kvm_make_vcpu_request(kvm, vcpu, req, tmp); >> + } >> } >> >> called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT)); >> - put_cpu(); >> + >> + preempt_enable(); >> >> return called; >> } >> -- >> 2.31.1 >> > -- Vitaly