Re: [PATCH] Documentation: kvm: clarify SRCU locking order

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 3 Jan 2023 17:42:31 +0000

On Wed, Dec 28, 2022, Paolo Bonzini wrote:
> Currently only the locking order of SRCU vs kvm->slots_arch_lock
> and kvm->slots_lock is documented.  Extend this to kvm->lock
> since Xen emulation got it terribly wrong.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
>  Documentation/virt/kvm/locking.rst | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 845a561629f1..a3ca76f9be75 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows:
>  - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
>    them together is quite rare.
>  
> -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before
> -  synchronize_srcu(&kvm->srcu).  Therefore kvm->slots_arch_lock
> -  can be taken inside a kvm->srcu read-side critical section,
> -  while kvm->slots_lock cannot.
> -
>  - kvm->mn_active_invalidate_count ensures that pairs of
>    invalidate_range_start() and invalidate_range_end() callbacks
>    use the same memslots array.  kvm->slots_lock and kvm->slots_arch_lock
>    are taken on the waiting side in install_new_memslots, so MMU notifiers
>    must not take either kvm->slots_lock or kvm->slots_arch_lock.
>  
> +For SRCU:
> +
> +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
> +  the kvm->slots_lock critical section, therefore kvm->slots_lock
> +  cannot be taken inside a kvm->srcu read-side critical section.
> +  Instead, kvm->slots_arch_lock is released before the call
> +  to ``synchronize_srcu()`` and _can_ be taken inside a
> +  kvm->srcu read-side critical section.
> +
> +- kvm->lock is taken inside kvm->srcu, therefore

Prior to the recent Xen change, is this actually true?  There are many instances
where kvm->srcu is taken inside kvm->lock, but I can't find any existing cases
where the reverse is true.  Logically, it makes sense to take kvm->lock first since
kvm->srcu can be taken deep in helpers, e.g. for accessing guest memory.  It's also
more consistent to take kvm->lock first since kvm->srcu is taken inside vcpu->mutex,
and vcpu->mutex is taken inside kvm->lock.

Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se,
but it's going to result in a weird set of rules because synchronize_scru() can,
and is, called while holding a variety of other locks.

In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the
real bug.

> +  ``synchronize_srcu(&kvm->srcu)`` cannot be called inside
> +  a kvm->lock critical section.  If you cannot delay the
> +  call until after kvm->lock is released, use ``call_srcu``.
> +
>  On x86:
>  
>  - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
> -- 
> 2.31.1
>