On Tue, Jan 03, 2023, Sean Christopherson wrote: > On Wed, Dec 28, 2022, Paolo Bonzini wrote: > > Currently only the locking order of SRCU vs kvm->slots_arch_lock > > and kvm->slots_lock is documented. Extend this to kvm->lock > > since Xen emulation got it terribly wrong. > > > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > --- > > Documentation/virt/kvm/locking.rst | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst > > index 845a561629f1..a3ca76f9be75 100644 > > --- a/Documentation/virt/kvm/locking.rst > > +++ b/Documentation/virt/kvm/locking.rst > > @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows: > > - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring > > them together is quite rare. > > > > -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before > > - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock > > - can be taken inside a kvm->srcu read-side critical section, > > - while kvm->slots_lock cannot. > > - > > - kvm->mn_active_invalidate_count ensures that pairs of > > invalidate_range_start() and invalidate_range_end() callbacks > > use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock > > are taken on the waiting side in install_new_memslots, so MMU notifiers > > must not take either kvm->slots_lock or kvm->slots_arch_lock. > > > > +For SRCU: > > + > > +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_ > > + the kvm->slots_lock critical section, therefore kvm->slots_lock > > + cannot be taken inside a kvm->srcu read-side critical section. > > + Instead, kvm->slots_arch_lock is released before the call > > + to ``synchronize_srcu()`` and _can_ be taken inside a > > + kvm->srcu read-side critical section. > > + > > +- kvm->lock is taken inside kvm->srcu, therefore > > Prior to the recent Xen change, is this actually true? I was thinking of a different change, but v5.19 is still kinda recent, so I'll count it. Better to be lucky than good :-) Commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") introduced the only case I can find where kvm->srcu is taken inside kvm->lock. > There are many instances where kvm->srcu is taken inside kvm->lock, but I > can't find any existing cases where the reverse is true. Logically, it makes > sense to take kvm->lock first since kvm->srcu can be taken deep in helpers, > e.g. for accessing guest memory. It's also more consistent to take kvm->lock > first since kvm->srcu is taken inside vcpu->mutex, and vcpu->mutex is taken > inside kvm->lock. > > Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se, > but it's going to result in a weird set of rules because synchronize_scru() can, > and is, called while holding a variety of other locks. > > In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the > real bug. I'm doubing down on this. Taking kvm->srcu outside of kvm->lock is all kinds of sketchy, and likely indicates a larger problem. The aformentioned commit that introduced the problematic kvm->srcu vs. kvm->lock also blatantly violates ordering between kvm->lock and vcpu->mutex. Details in the link[*]. The vast majority of flows that take kvm->srcu will already hold a lock of some kind, otherwise the task can't safely deference any VM/vCPU/device data and thus has no reason to acquire kvm->srcu. E.g. taking kvm->srcu to read guest memory is nonsensical without a stable guest physical address to work with. There are exceptions, e.g. evtchn_set_fn() and maybe some ioctls(), but in most cases, taking kvm->lock inside kvm->srcu is just asking for problems. [*] https://lore.kernel.org/all/Y7dN0Negds7XUbvI@xxxxxxxxxx