On Tue, 31 Jul 2012, Don Zickus wrote: > We ran into an interesting deadlock on RHEL-5 (2.6.18) that I believe > still appiles to the current kernel involving the ehci->lock. > > CPU A: > submits a bulk transfer urb > ehci_urb_enqueue calls submit_async > submit_async blocks on ehci->lock with irq disabled (the result > of spin_lock_irqsave) for CPU B > > CPU B: > takes an ehci interrupt > locks ehci->lock > pre-empted by an IPI handler which spins waiting for CPU C > > CPU C: > takes an MTRR request > sends an IPI to all cpus to block > spins waiting for all cpus to block > > CPU A nevers processes IPI because its interrupts are disabled, > this creates the 3-way deadlock. > > This deadlock is hard to reproduce by our customer, but based on their vmcore > it seems clear the above is what happened. I attatched a suggested patch > from a colleague that would seem to resolve the problem. Because it is > hard to reproduce, I have not been able to test it to verify it resolves > the problem. > > The patch just turns spin_locks in the spin_lock_irqsaves in the ehci_irq > function. This would essentially block the IPI handler and let the interrupt > handler finish before processing the IPI. Then CPU A would get a chance to > finish and process its IPI. > > Looking at the code paths in 2.6.18 and 3.5, the locking still seems the same > which is why I believe the problem still exists. However, someone in the office > thought the MTRR code has been re-written, so the problem we are seeing might > be more difficult to see with the current kernel. > > This patch does feel awkward, disabling interrupts in the irq handler. It seems > like it would make more sense to remove the locking from the irq handler. But > that is probably more work and my knowledge of USB is limited. I'll start with > this patch and see where the conversation goes. > > Any feedback would be appreciated. 2.6.18 is awfully old -- almost 6 years! Anyway, commit de85422b94ddb23c021126815ea49414047c13dc (USB: fix interrupt disabling for HCDs with shared interrupt handlers) took care of this problem way back in 2.6.26. You should be able to back-port the patch. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html