Re: [PATCH 1/2] KVM: x86: Fix deadlock in kvm_vm_ioctl_set_msr_filter()

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 5 Jan 2023 22:23:12 +0000

On Thu, Jan 05, 2023, Michal Luczaj wrote:
> On 1/3/23 18:17, Sean Christopherson wrote:
> > On Thu, Dec 29, 2022, Michal Luczaj wrote:
> >> Move synchronize_srcu(&kvm->srcu) out of kvm->lock critical section.
> > 
> > This needs a much more descriptive changelog, and an update to
> > Documentation/virt/kvm/locking.rst to define the ordering requirements between
> > kvm->scru and kvm->lock.  And IIUC, there is no deadlock in the current code
> > base, so this really should be a prep patch that's sent along with the Xen series[*]
> > that wants to take kvm->-srcu outside of kvm->lock.
> > 
> > [*] https://lore.kernel.org/all/20221222203021.1944101-2-mhal@xxxxxxx
> 
> I'd be happy to provide a more descriptive changelog, but right now I'm a
> bit confused. I'd be really grateful for some clarifications:
> 
> I'm not sure how to understand "no deadlock in the current code base". I've
> ran selftests[1] under the up-to-date mainline/master and I do see the
> deadlocks. Is there a branch where kvm_xen_set_evtchn() is not taking
> kvm->lock while inside kvm->srcu?

Ah, no, I'm the one that's confused, I saw an earlier patch touch SRCU stuff and
assumed it introduced the deadlock.  Actually, it's the KVM Xen code that's confused.

This comment in kvm_xen_set_evtchn() is a tragicomedy.  It explicitly calls out
the exact case that would be problematic (Xen hypercall), but commit 2fd6df2f2b47
("KVM: x86/xen: intercept EVTCHNOP_send from guests") ran right past that.

	/*
	 * For the irqfd workqueue, using the main kvm->lock mutex is
	 * fine since this function is invoked from kvm_set_irq() with
	 * no other lock held, no srcu. In future if it will be called
	 * directly from a vCPU thread (e.g. on hypercall for an IPI)
	 * then it may need to switch to using a leaf-node mutex for
	 * serializing the shared_info mapping.
	 */
	mutex_lock(&kvm->lock);

> Also, is there a consensus as for the lock ordering?  IOW, is the state of
> virt/kvm/locking.rst up to date, regardless of the discussion going on[2]?

I'm not convinced that allowing kvm->lock to be taken while holding kvm->srcu is
a good idea.  Requiring kvm->lock to be dropped before doing synchronize_srcu()
isn't problematic, and arguably it's a good rule since holding kvm->lock for
longer than necessary is undesirable.  What I don't like is taking kvm->lock inside
kvm->srcu.  It's not documented, but in pretty much every other case except Xen,
sleepable locks are taken outside of kvm->srcu, e.g. vcpu->mutex, slots_lock, and
quite often kvm->lock itself.

Ha!  Case in point.  The aforementioned Xen code blatantly violates KVM's locking
rules:

  - kvm->lock is taken outside vcpu->mutex

In the kvm_xen_hypercal() case, vcpu->mutex is held (KVM_RUN) when kvm_xen_set_evtchn()
is called, i.e. takes kvm->lock inside vcpu->mutex.  It doesn't cause explosions
because KVM x86 only takes vcpu->mutex inside kvm->lock for SEV, and no one runs
Xen+SEV guests, but the Xen code is still a trainwreck waiting to happen.

In other words, I'm find with this patch for optimization purposes, but I don't
think we should call it a bug fix.  commit 2fd6df2f2b47 ("KVM: x86/xen: intercept
EVTCHNOP_send from guests") is the one who is wrong and needs fixing.