Re: [PATCH] Documentation: kvm: clarify SRCU locking order

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 5 Jan 2023 22:38:33 +0000

On Tue, Jan 03, 2023, Sean Christopherson wrote:
> On Wed, Dec 28, 2022, Paolo Bonzini wrote:
> > Currently only the locking order of SRCU vs kvm->slots_arch_lock
> > and kvm->slots_lock is documented.  Extend this to kvm->lock
> > since Xen emulation got it terribly wrong.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > ---
> >  Documentation/virt/kvm/locking.rst | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> > index 845a561629f1..a3ca76f9be75 100644
> > --- a/Documentation/virt/kvm/locking.rst
> > +++ b/Documentation/virt/kvm/locking.rst
> > @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows:
> >  - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
> >    them together is quite rare.
> >  
> > -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before
> > -  synchronize_srcu(&kvm->srcu).  Therefore kvm->slots_arch_lock
> > -  can be taken inside a kvm->srcu read-side critical section,
> > -  while kvm->slots_lock cannot.
> > -
> >  - kvm->mn_active_invalidate_count ensures that pairs of
> >    invalidate_range_start() and invalidate_range_end() callbacks
> >    use the same memslots array.  kvm->slots_lock and kvm->slots_arch_lock
> >    are taken on the waiting side in install_new_memslots, so MMU notifiers
> >    must not take either kvm->slots_lock or kvm->slots_arch_lock.
> >  
> > +For SRCU:
> > +
> > +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
> > +  the kvm->slots_lock critical section, therefore kvm->slots_lock
> > +  cannot be taken inside a kvm->srcu read-side critical section.
> > +  Instead, kvm->slots_arch_lock is released before the call
> > +  to ``synchronize_srcu()`` and _can_ be taken inside a
> > +  kvm->srcu read-side critical section.
> > +
> > +- kvm->lock is taken inside kvm->srcu, therefore
> 
> Prior to the recent Xen change, is this actually true?

I was thinking of a different change, but v5.19 is still kinda recent, so I'll
count it.  Better to be lucky than good :-)

Commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") introduced
the only case I can find where kvm->srcu is taken inside kvm->lock.

> There are many instances where kvm->srcu is taken inside kvm->lock, but I
> can't find any existing cases where the reverse is true.  Logically, it makes
> sense to take kvm->lock first since kvm->srcu can be taken deep in helpers,
> e.g. for accessing guest memory.  It's also more consistent to take kvm->lock
> first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken
> inside kvm->lock.
> 
> Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se,
> but it's going to result in a weird set of rules because synchronize_scru() can,
> and is, called while holding a variety of other locks.
> 
> In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the
> real bug.

I'm doubing down on this.  Taking kvm->srcu outside of kvm->lock is all kinds of
sketchy, and likely indicates a larger problem.  The aformentioned commit that
introduced the problematic kvm->srcu vs. kvm->lock also blatantly violates ordering
between kvm->lock and vcpu->mutex.  Details in the link[*].

The vast majority of flows that take kvm->srcu will already hold a lock of some
kind, otherwise the task can't safely deference any VM/vCPU/device data and thus
has no reason to acquire kvm->srcu.  E.g. taking kvm->srcu to read guest memory
is nonsensical without a stable guest physical address to work with.

There are exceptions, e.g. evtchn_set_fn() and maybe some ioctls(), but in most
cases, taking kvm->lock inside kvm->srcu is just asking for problems.

[*] https://lore.kernel.org/all/Y7dN0Negds7XUbvI@xxxxxxxxxx