On Thu, Jan 05, 2023, Michal Luczaj wrote: > On 1/3/23 18:17, Sean Christopherson wrote: > > On Thu, Dec 29, 2022, Michal Luczaj wrote: > >> Move synchronize_srcu(&kvm->srcu) out of kvm->lock critical section. > > > > This needs a much more descriptive changelog, and an update to > > Documentation/virt/kvm/locking.rst to define the ordering requirements between > > kvm->scru and kvm->lock. And IIUC, there is no deadlock in the current code > > base, so this really should be a prep patch that's sent along with the Xen series[*] > > that wants to take kvm->-srcu outside of kvm->lock. > > > > [*] https://lore.kernel.org/all/20221222203021.1944101-2-mhal@xxxxxxx > > I'd be happy to provide a more descriptive changelog, but right now I'm a > bit confused. I'd be really grateful for some clarifications: > > I'm not sure how to understand "no deadlock in the current code base". I've > ran selftests[1] under the up-to-date mainline/master and I do see the > deadlocks. Is there a branch where kvm_xen_set_evtchn() is not taking > kvm->lock while inside kvm->srcu? Ah, no, I'm the one that's confused, I saw an earlier patch touch SRCU stuff and assumed it introduced the deadlock. Actually, it's the KVM Xen code that's confused. This comment in kvm_xen_set_evtchn() is a tragicomedy. It explicitly calls out the exact case that would be problematic (Xen hypercall), but commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") ran right past that. /* * For the irqfd workqueue, using the main kvm->lock mutex is * fine since this function is invoked from kvm_set_irq() with * no other lock held, no srcu. In future if it will be called * directly from a vCPU thread (e.g. on hypercall for an IPI) * then it may need to switch to using a leaf-node mutex for * serializing the shared_info mapping. */ mutex_lock(&kvm->lock); > Also, is there a consensus as for the lock ordering? IOW, is the state of > virt/kvm/locking.rst up to date, regardless of the discussion going on[2]? I'm not convinced that allowing kvm->lock to be taken while holding kvm->srcu is a good idea. Requiring kvm->lock to be dropped before doing synchronize_srcu() isn't problematic, and arguably it's a good rule since holding kvm->lock for longer than necessary is undesirable. What I don't like is taking kvm->lock inside kvm->srcu. It's not documented, but in pretty much every other case except Xen, sleepable locks are taken outside of kvm->srcu, e.g. vcpu->mutex, slots_lock, and quite often kvm->lock itself. Ha! Case in point. The aforementioned Xen code blatantly violates KVM's locking rules: - kvm->lock is taken outside vcpu->mutex In the kvm_xen_hypercal() case, vcpu->mutex is held (KVM_RUN) when kvm_xen_set_evtchn() is called, i.e. takes kvm->lock inside vcpu->mutex. It doesn't cause explosions because KVM x86 only takes vcpu->mutex inside kvm->lock for SEV, and no one runs Xen+SEV guests, but the Xen code is still a trainwreck waiting to happen. In other words, I'm find with this patch for optimization purposes, but I don't think we should call it a bug fix. commit 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") is the one who is wrong and needs fixing.