On Wed, 2022-12-28 at 01:21 +0100, Michal Luczaj wrote: > On 12/27/22 12:11, Paolo Bonzini wrote: > > On 12/24/22 12:14, Michal Luczaj wrote: > > > > This lock/unlock pair can cause a deadlock because it's inside the SRCU > > > > read side critical section. Fortunately it's simpler to just use > > > > mutex_lock for the whole function instead of using two small critical > > > > sections, and then SRCU is not needed. > > > Ah, I didn't think using a single mutex_lock would be ok. And maybe that's > > > a silly question, but how can this lock/unlock pair cause a deadlock? My > > > assumption was it's a*sleepable* RCU after all. > > > > > > > This is a pretty simple AB-BA deadlock, just involving SRCU and a mutex > > instead of two mutexes: > > > > Thread 1 Thread 2 > > mutex_lock() > > srcu_read_lock() > > mutex_lock() // stops here > > synchronize_srcu() // stops here > > mutex_unlock() > > mutex_unlock() > > srcu_read_unlock() > > > > Thread 2's synchronize_srcu() is waiting for srcu_read_unlock(), which > > however won't happen until thread 1 can take the mutex. But the mutex > > is taken by thread 2, hence the deadlock. > > Ahh, I see. Thank you for explaining. > > Does it mean kvm_xen_hcall_evtchn_send() deadlocks in the same fashion? > > kvm_xen_eventfd_reset() > mutex_lock() > srcu_read_lock() > kvm_xen_hcall_evtchn_send() > kvm_xen_set_evtchn() > mutex_lock() > synchronize_srcu() > > Michal Good catch. You'll note that above that mutex_lock() in kvm_xen_set_evtchn() there's already a comment saying "if in future it will be called directly from a vCPU thread"... which it is now. /* * For the irqfd workqueue, using the main kvm->lock mutex is * fine since this function is invoked from kvm_set_irq() with * no other lock held, no srcu. In future if it will be called * directly from a vCPU thread (e.g. on hypercall for an IPI) * then it may need to switch to using a leaf-node mutex for * serializing the shared_info mapping. */ mutex_lock(&kvm->lock); I think that if we switch to a new lock rather than piggy-backing on kvm->lock, we can declare that none shall call synchronize_srcu() while holding it, and thus avoid the potential deadlock? We have to fix the one case where the Xen code already does that, in kvm_xen_eventfd_reset(). But fixing that one alone wouldn't solve it for the general case and let us use kvm->lock, I presume.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature