Patch "KVM: Clean up benign vcpu->cpu data races when kicking vCPUs" has been added to the 5.10-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 22 Mar 2023 15:43:43 -0400

This is a note to let you know that I've just added the patch titled

    KVM: Clean up benign vcpu->cpu data races when kicking vCPUs

to the 5.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     kvm-clean-up-benign-vcpu-cpu-data-races-when-kicking.patch
and it can be found in the queue-5.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 0fcae9372e827e726d3deb9d5ff5916d34647f8e
Author: Sean Christopherson <seanjc@xxxxxxxxxx>
Date:   Fri Aug 27 11:25:09 2021 +0200

    KVM: Clean up benign vcpu->cpu data races when kicking vCPUs
    
    [ Upstream commit 85b640450ddcfa09cf72771b69a9c3daf0ddc772 ]
    
    Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu
    is read exactly once, and by ensuring the vCPU is booted from guest mode
    if kvm_arch_vcpu_should_kick() returns true.  Fix a similar race in
    kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if
    kvm_request_needs_ipi() returns true.
    
    Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or
    kvm_request_needs_ipi()) means the target vCPU could get migrated (change
    vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and
    reading vcpu->mode.  If that happens, the kick/IPI will be sent to the
    old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs.
    
    Although failing to kick the vCPU is not exactly ideal, practically
    speaking it cannot cause a functional issue unless there is also a bug in
    the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s
    behavior.
    
    The purpose of sending an IPI is purely to get a vCPU into the host (or
    out of reading SPTEs) so that the vCPU can recognize a change in state,
    e.g. a KVM_REQ_* request.  If vCPU's handling of the state change is
    required for correctness, KVM must ensure either the vCPU sees the change
    before entering the guest, or that the sender sees the vCPU as running in
    guest mode.  All architectures handle this by (a) sending the request
    before calling kvm_vcpu_kick() and (b) checking for requests _after_
    setting vcpu->mode.
    
    x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to
    ensure it kicks and waits for vCPUs that started reading SPTEs _before_
    MMU changes were finalized, but any vCPU that starts reading after MMU
    changes were finalized will see the new state and can continue on
    uninterrupted.
    
    For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g.
    x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied
    upon for functional correctness, e.g. in the dirty log case, userspace
    cannot assume it has a 100% complete log if vCPUs are still running.
    
    All that said, eliminate the benign race since the cost of doing so is an
    "extra" atomic cmpxchg() in the case where the target vCPU is loaded by
    the current pCPU or is not loaded at all.  I.e. the kick will be skipped
    due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as
    opposed to the kick being skipped because of the cpu checks.
    
    Keep the "cpu != me" checks even though they appear useless/impossible at
    first glance.  x86 processes guest IPI writes in a fast path that runs in
    IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE.  And
    calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or
    READING_SHADOW_PAGE_TABLES is perfectly reasonable.
    
    Note, a race with the cpu_online() check in kvm_vcpu_kick() likely
    persists, e.g. the vCPU could exit guest mode and get offlined between
    the cpu_online() check and the sending of smp_send_reschedule().  But,
    the online check appears to exist only to avoid a WARN in x86's
    native_smp_send_reschedule() that fires if the target CPU is not online.
    The reschedule WARN exists because CPU offlining takes the CPU out of the
    scheduling pool, i.e. the WARN is intended to detect the case where the
    kernel attempts to schedule a task on an offline CPU.  The actual sending
    of the IPI is a non-issue as at worst it will simpy be dropped on the
    floor.  In other words, KVM's usurping of the reschedule IPI could
    theoretically trigger a WARN if the stars align, but there will be no
    loss of functionality.
    
    [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a
    
    Cc: Venkatesh Srinivas <venkateshs@xxxxxxxxxx>
    Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
    Fixes: 97222cc83163 ("KVM: Emulate local APIC in kernel")
    Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
    Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
    Message-Id: <20210827092516.1027264-2-vkuznets@xxxxxxxxxx>
    Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
    Stable-dep-of: 2b0128127373 ("KVM: Register /dev/kvm as the _very_ last thing during initialization")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 564d5c145fbe7..b5134f3046483 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -276,14 +276,26 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 			continue;
 
 		kvm_make_request(req, vcpu);
-		cpu = vcpu->cpu;
 
 		if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
 			continue;
 
-		if (tmp != NULL && cpu != -1 && cpu != me &&
-		    kvm_request_needs_ipi(vcpu, req))
-			__cpumask_set_cpu(cpu, tmp);
+		/*
+		 * Note, the vCPU could get migrated to a different pCPU at any
+		 * point after kvm_request_needs_ipi(), which could result in
+		 * sending an IPI to the previous pCPU.  But, that's ok because
+		 * the purpose of the IPI is to ensure the vCPU returns to
+		 * OUTSIDE_GUEST_MODE, which is satisfied if the vCPU migrates.
+		 * Entering READING_SHADOW_PAGE_TABLES after this point is also
+		 * ok, as the requirement is only that KVM wait for vCPUs that
+		 * were reading SPTEs _before_ any changes were finalized.  See
+		 * kvm_vcpu_kick() for more details on handling requests.
+		 */
+		if (tmp != NULL && kvm_request_needs_ipi(vcpu, req)) {
+			cpu = READ_ONCE(vcpu->cpu);
+			if (cpu != -1 && cpu != me)
+				__cpumask_set_cpu(cpu, tmp);
+		}
 	}
 
 	called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT));
@@ -2937,16 +2949,24 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
  */
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
-	int me;
-	int cpu = vcpu->cpu;
+	int me, cpu;
 
 	if (kvm_vcpu_wake_up(vcpu))
 		return;
 
+	/*
+	 * Note, the vCPU could get migrated to a different pCPU at any point
+	 * after kvm_arch_vcpu_should_kick(), which could result in sending an
+	 * IPI to the previous pCPU.  But, that's ok because the purpose of the
+	 * IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the
+	 * vCPU also requires it to leave IN_GUEST_MODE.
+	 */
 	me = get_cpu();
-	if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
-		if (kvm_arch_vcpu_should_kick(vcpu))
+	if (kvm_arch_vcpu_should_kick(vcpu)) {
+		cpu = READ_ONCE(vcpu->cpu);
+		if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 			smp_send_reschedule(cpu);
+	}
 	put_cpu();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);