On Wed, Aug 01, 2012 at 10:42:37AM -0400, Alan Stern wrote: > On Tue, 31 Jul 2012, Don Zickus wrote: > > > We ran into an interesting deadlock on RHEL-5 (2.6.18) that I believe > > still appiles to the current kernel involving the ehci->lock. > > > > CPU A: > > submits a bulk transfer urb > > ehci_urb_enqueue calls submit_async > > submit_async blocks on ehci->lock with irq disabled (the result > > of spin_lock_irqsave) for CPU B > > > > CPU B: > > takes an ehci interrupt > > locks ehci->lock > > pre-empted by an IPI handler which spins waiting for CPU C > > > > CPU C: > > takes an MTRR request > > sends an IPI to all cpus to block > > spins waiting for all cpus to block > > > > CPU A nevers processes IPI because its interrupts are disabled, > > this creates the 3-way deadlock. > > > > This deadlock is hard to reproduce by our customer, but based on their vmcore > > it seems clear the above is what happened. I attatched a suggested patch > > from a colleague that would seem to resolve the problem. Because it is > > hard to reproduce, I have not been able to test it to verify it resolves > > the problem. > > > > The patch just turns spin_locks in the spin_lock_irqsaves in the ehci_irq > > function. This would essentially block the IPI handler and let the interrupt > > handler finish before processing the IPI. Then CPU A would get a chance to > > finish and process its IPI. > > > > Looking at the code paths in 2.6.18 and 3.5, the locking still seems the same > > which is why I believe the problem still exists. However, someone in the office > > thought the MTRR code has been re-written, so the problem we are seeing might > > be more difficult to see with the current kernel. > > > > This patch does feel awkward, disabling interrupts in the irq handler. It seems > > like it would make more sense to remove the locking from the irq handler. But > > that is probably more work and my knowledge of USB is limited. I'll start with > > this patch and see where the conversation goes. > > > > Any feedback would be appreciated. > > 2.6.18 is awfully old -- almost 6 years! Heh. Yes, that is world of RHEL and supporting kernels for 10 years. :-( I only brought up the issue because I thought it was still relevant, but.. > > Anyway, commit de85422b94ddb23c021126815ea49414047c13dc (USB: fix > interrupt disabling for HCDs with shared interrupt handlers) took care > of this problem way back in 2.6.26. You should be able to back-port > the patch. I see it was fixed at a higher level. Sorry I missed that. Thanks for pointing out the commit. Again sorry for the noise. Thanks! Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html