Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/24 19:48, Sean Christopherson wrote:
On Thu, Oct 10, 2024, Paolo Bonzini wrote:
On 10/9/24 17:04, Sean Christopherson wrote:
Now that KVM loads from vcpu_array if and only if the target index is
valid with respect to online_vcpus, i.e. now that it is safe to erase a
not-fully-onlined vCPU entry, revert to storing into vcpu_array before
success is guaranteed.

If xa_store() fails, which _should_ be impossible, then putting the vCPU's
reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
been installed and owns the vCPU's reference.

This was found by inspection, but forcing the xa_store() to fail
confirms the problem:

   | Unable to handle kernel paging request at virtual address ffff800080ecd960
   | Call trace:
   |  _raw_spin_lock_irq+0x2c/0x70
   |  kvm_irqfd_release+0x24/0xa0
   |  kvm_vm_release+0x1c/0x38
   |  __fput+0x88/0x2ec
   |  ____fput+0x10/0x1c
   |  task_work_run+0xb0/0xd4
   |  do_exit+0x210/0x854
   |  do_group_exit+0x70/0x98
   |  get_signal+0x6b0/0x73c
   |  do_signal+0xa4/0x11e8
   |  do_notify_resume+0x60/0x12c
   |  el0_svc+0x64/0x68
   |  el0t_64_sync_handler+0x84/0xfc
   |  el0t_64_sync+0x190/0x194
   | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)

Practically speaking, this is a non-issue as xa_store() can't fail, absent
a nasty kernel bug.  But the code is visually jarring and technically
broken.

This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.

Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Michal Luczaj <mhal@xxxxxxx>
Cc: Alexander Potapenko <glider@xxxxxxxxxx>
Cc: Marc Zyngier <maz@xxxxxxxxxx>
Reported-by: Will Deacon <will@xxxxxxxxxx>
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
   virt/kvm/kvm_main.c | 14 +++++---------
   1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fca9f74e9544..f081839521ef 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
   	}
   	vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
-	r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
+	r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+	BUG_ON(r == -EBUSY);
   	if (r)
   		goto unlock_vcpu_destroy;
@@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
   	kvm_get_kvm(kvm);
   	r = create_vcpu_fd(vcpu);
   	if (r < 0)
-		goto kvm_put_xa_release;
-
-	if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
-		r = -EINVAL;
-		goto kvm_put_xa_release;
-	}
+		goto kvm_put_xa_erase;

I also find it a bit jarring though that we have to undo the insertion. This
is a chicken-and-egg situation where you are pick one operation B that will
have to undo operation A if it fails.  But what xa_store is doing, is
breaking this deadlock.

The code is a bit longer, sure, but I don't see the point in complicating
the vcpu_array invariants and letting an entry disappear.

But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than
online_vcpus.  And that rule is necessary regardless of whether or not vcpu_array[x]
is filled before success is guaranteed.

Even if the invariant is explainable I still find xa_erase to be uglier than xa_release, but maybe it's just me.

The reason I'm not fully convinced by the explanation is that...

I'm not concerned about the code length, it's that we need to do _something_ if
xa_store() fails.  Yeah, it should never happen, but knowingly doing nothing feels
all kinds of wrong.

... it seems to me that this is not just an issue in KVM code; it should apply to other uses of xa_reserve()/xa_store() as well. If xa_store() fails after xa_reserve(), you're pretty much using the xarray API incorrectly... and then, just make it a BUG()? I know that BUG() is frowned upon, but if the API causes invalid memory accesses when used incorrectly, one might as well fail as early as possible and before the invalid memory access becomes exploitable.

I don't like BUG(), because it's obviously very doable to
gracefully handle failure.

Yes, you can by using a different API. But the point is that in the reserve/store case the insert failure becomes a reserve failure, never a store failure.

Maybe there should be an xa_store_reserved() that BUGs on failure, I don't know.

Paolo





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux