On Tue, Feb 06, 2024, David Woodhouse wrote: > On Tue, 2024-02-06 at 18:58 -0800, Sean Christopherson wrote: > > On Tue, Feb 06, 2024, David Woodhouse wrote: > > > On Tue, 2024-02-06 at 10:41 -0800, Sean Christopherson wrote: > > > > > > > > This has an obvious-in-hindsight recursive deadlock bug. If KVM actually needs > > > > to inject a timer IRQ, and the fast path fails, i.e. the gpc is invalid, > > > > kvm_xen_set_evtchn() will attempt to acquire xen.xen_lock, which is already held > > > > > > Hm, right. In fact, kvm_xen_set_evtchn() shouldn't actually *need* the > > > xen_lock in an ideal world; it's only taking it in order to work around > > > the fact that the gfn_to_pfn_cache doesn't have its *own* self- > > > sufficient locking. I have patches for that... > > > > > > I think the *simplest* of the "patches for that" approaches is just to > > > use the gpc->refresh_lock to cover all activate, refresh and deactivate > > > calls. I was waiting for Paul's series to land before sending that one, > > > but I'll work on it today, and double-check my belief that we can then > > > just drop xen_lock from kvm_xen_set_evtchn(). > > > > While I definitely want to get rid of arch.xen.xen_lock, I don't want to address > > the deadlock by relying on adding more locking to the gpc code. I want a teeny > > tiny patch that is easy to review and backport. Y'all are *proably* the only > > folks that care about Xen emulation, but even so, that's not a valid reason for > > taking a roundabout way to fixing a deadlock. > > I strongly disagree. I get that you're reticent about fixing the gpc > locking, but what I'm proposing is absolutely *not* a 'roundabout way > to fixing a deadlock'. The kvm_xen_set_evtchn() function shouldn't > *need* that lock; it's only taking it because of the underlying problem > with the gpc itself, which needs its caller to do its locking for it. > > The solution is not to do further gymnastics with the xen_lock. I agree that's the long term solution, but I am not entirely confident that a big overhaul is 6.9 material at this point. Squeezing an overhaul into 6.8 (and if we're being nitpicky, backporting to 6.7) is out of the question.