We ran into an interesting deadlock on RHEL-5 (2.6.18) that I believe still appiles to the current kernel involving the ehci->lock. CPU A: submits a bulk transfer urb ehci_urb_enqueue calls submit_async submit_async blocks on ehci->lock with irq disabled (the result of spin_lock_irqsave) for CPU B CPU B: takes an ehci interrupt locks ehci->lock pre-empted by an IPI handler which spins waiting for CPU C CPU C: takes an MTRR request sends an IPI to all cpus to block spins waiting for all cpus to block CPU A nevers processes IPI because its interrupts are disabled, this creates the 3-way deadlock. This deadlock is hard to reproduce by our customer, but based on their vmcore it seems clear the above is what happened. I attatched a suggested patch from a colleague that would seem to resolve the problem. Because it is hard to reproduce, I have not been able to test it to verify it resolves the problem. The patch just turns spin_locks in the spin_lock_irqsaves in the ehci_irq function. This would essentially block the IPI handler and let the interrupt handler finish before processing the IPI. Then CPU A would get a chance to finish and process its IPI. Looking at the code paths in 2.6.18 and 3.5, the locking still seems the same which is why I believe the problem still exists. However, someone in the office thought the MTRR code has been re-written, so the problem we are seeing might be more difficult to see with the current kernel. This patch does feel awkward, disabling interrupts in the irq handler. It seems like it would make more sense to remove the locking from the irq handler. But that is probably more work and my knowledge of USB is limited. I'll start with this patch and see where the conversation goes. Any feedback would be appreciated. [dzickus - rewrote the description, ported to v3.5] Signed-off-by: Casey Dahlin <cdahlin@xxxxxxxxxx> Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx> --- drivers/usb/host/ehci-hcd.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c index 800be38..a31d9cc 100644 --- a/drivers/usb/host/ehci-hcd.c +++ b/drivers/usb/host/ehci-hcd.c @@ -849,8 +849,9 @@ static irqreturn_t ehci_irq (struct usb_hcd *hcd) struct ehci_hcd *ehci = hcd_to_ehci (hcd); u32 status, masked_status, pcd_status = 0, cmd; int bh; + unsigned long flags; - spin_lock (&ehci->lock); + spin_lock_irqsave(&ehci->lock, flags); status = ehci_readl(ehci, &ehci->regs->status); @@ -868,7 +869,7 @@ static irqreturn_t ehci_irq (struct usb_hcd *hcd) /* Shared IRQ? */ if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) { - spin_unlock(&ehci->lock); + spin_unlock_irqrestore(&ehci->lock, flags); return IRQ_NONE; } @@ -969,7 +970,7 @@ dead: if (bh) ehci_work (ehci); - spin_unlock (&ehci->lock); + spin_unlock_irqrestore(&ehci->lock, flags); if (pcd_status) usb_hcd_poll_rh_status(hcd); return IRQ_HANDLED; -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html