Re: [PATCH] usb, ehci: Avoid deadlock of ehci->lock by disabling interrupts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 01, 2012 at 10:42:37AM -0400, Alan Stern wrote:
> On Tue, 31 Jul 2012, Don Zickus wrote:
> 
> > We ran into an interesting deadlock on RHEL-5 (2.6.18) that I believe
> > still appiles to the current kernel involving the ehci->lock.
> > 
> > CPU A:
> > submits a bulk transfer urb
> > ehci_urb_enqueue calls submit_async
> > submit_async blocks on ehci->lock with irq disabled (the result
> > of spin_lock_irqsave) for CPU B
> > 
> > CPU B:
> > takes an ehci interrupt
> > locks ehci->lock
> > pre-empted by an IPI handler which spins waiting for CPU C
> > 
> > CPU C:
> > takes an MTRR request
> > sends an IPI to all cpus to block
> > spins waiting for all cpus to block
> > 
> > CPU A nevers processes IPI because its interrupts are disabled,
> > this creates the 3-way deadlock.
> > 
> > This deadlock is hard to reproduce by our customer, but based on their vmcore
> > it seems clear the above is what happened.  I attatched a suggested patch
> > from a colleague that would seem to resolve the problem.  Because it is
> > hard to reproduce, I have not been able to test it to verify it resolves
> > the problem.
> > 
> > The patch just turns spin_locks in the spin_lock_irqsaves in the ehci_irq
> > function.  This would essentially block the IPI handler and let the interrupt
> > handler finish before processing the IPI.  Then CPU A would get a chance to
> > finish and process its IPI.
> > 
> > Looking at the code paths in 2.6.18 and 3.5, the locking still seems the same
> > which is why I believe the problem still exists.  However, someone in the office
> > thought the MTRR code has been re-written, so the problem we are seeing might
> > be more difficult to see with the current kernel.
> > 
> > This patch does feel awkward, disabling interrupts in the irq handler.  It seems
> > like it would make more sense to remove the locking from the irq handler.  But
> > that is probably more work and my knowledge of USB is limited.  I'll start with
> > this patch and see where the conversation goes.
> > 
> > Any feedback would be appreciated.
> 
> 2.6.18 is awfully old -- almost 6 years!

Heh.  Yes, that is world of RHEL and supporting kernels for 10 years. :-(
I only brought up the issue because I thought it was still relevant, but..

> 
> Anyway, commit de85422b94ddb23c021126815ea49414047c13dc (USB: fix
> interrupt disabling for HCDs with shared interrupt handlers) took care
> of this problem way back in 2.6.26.  You should be able to back-port
> the patch.

I see it was fixed at a higher level.  Sorry I missed that.  Thanks for
pointing out the commit.  Again sorry for the noise.  Thanks!

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux