On Fri, 28 Apr 2017, Alan Stern wrote: > On Fri, 28 Apr 2017, Bart Van Assche wrote: > > > Hello, > > > > Every time I boot a particular development server the complaint shown > > below appears in the system log. I don't know when this behavior has > > been introduced. But I noticed that I can reproduce this with older > > kernel versions, e.g. 4.4.63. Does anyone know what is going on or how > > to fix this? > > > > Thanks, > > > > Bart. > > > > usb 1-8: new low-speed USB device number 2 using xhci_hcd > > usb 3-1: new high-speed USB device number 2 using ehci-pci > > > > ========================================================= > > [ INFO: possible irq lock inversion dependency detected ] > > 4.11.0-rc8-dbg+ #1 Not tainted > > --------------------------------------------------------- > > swapper/7/0 just changed the state of lock: > > �(&(&ehci->lock)->rlock){-.-...}, at: [<ffffffffa0130a69>] ehci_hrtimer_func+0x29/0xc0 [ehci_hcd] > > but this lock took another, HARDIRQ-unsafe lock in the past: > > �(hcd_urb_list_lock){+.....} > > > > > > and interrupts could create inverse lock ordering between them. > > I often find these lockdep splats difficult to understand. In this > case, why does lockdep think hcd_urb_list_lock is hardirq-unsafe? It's > supposed to be hardirq-safe. Peter Zijlstra kindly provided the answer to my question off-list; see below. > > other info that might help us debug this: > > �Possible interrupt unsafe locking scenario: > > > > �������CPU0��������������������CPU1 > > �������----��������������������---- > > � lock(hcd_urb_list_lock); > > �������������������������������local_irq_disable(); > > �������������������������������lock(&(&ehci->lock)->rlock); > > �������������������������������lock(hcd_urb_list_lock); > > � <Interrupt> > > ����lock(&(&ehci->lock)->rlock); > > �*** DEADLOCK *** > > > > no locks held by swapper/7/0. > > the shortest dependencies between 2nd lock and 1st lock: > > �-> (hcd_urb_list_lock){+.....} ops: 252 { > > ����HARDIRQ-ON-W at: > > ����������������������__lock_acquire+0x602/0x1280 > > ����������������������lock_acquire+0xd5/0x1c0 > > ����������������������_raw_spin_lock+0x2f/0x40 > > ����������������������usb_hcd_unlink_urb_from_ep+0x1b/0x60 [usbcore] > > ����������������������xhci_giveback_urb_in_irq.isra.45+0x70/0x1b0 [xhci_hcd] > > ����������������������finish_td.constprop.60+0x1d8/0x2e0 [xhci_hcd] > > ����������������������xhci_irq+0xdd6/0x1fa0 [xhci_hcd] > > ����������������������usb_hcd_irq+0x26/0x40 [usbcore] > > ����������������������irq_forced_thread_fn+0x2f/0x70 > > ����������������������irq_thread+0x149/0x1d0 > > ����������������������kthread+0x113/0x150 > > ����������������������ret_from_fork+0x2e/0x40 He wrote: > There, that's where we hold the lock with interrupts enabled. > > So someone forced interrupt threading and the code assumes the interrupt > code has IRQs disabled or something along those lines. So we should be able to fix the problem by changing spin_lock/unlock in xhci_irq() to spin_lock_irqsave/restore. Bart, please try out the patch below. Alan Stern Index: usb-4.x/drivers/usb/host/xhci-ring.c =================================================================== --- usb-4.x.orig/drivers/usb/host/xhci-ring.c +++ usb-4.x/drivers/usb/host/xhci-ring.c @@ -2628,26 +2628,27 @@ static int xhci_handle_event(struct xhci irqreturn_t xhci_irq(struct usb_hcd *hcd) { struct xhci_hcd *xhci = hcd_to_xhci(hcd); + unsigned long flags; u32 status; u64 temp_64; union xhci_trb *event_ring_deq; dma_addr_t deq; - spin_lock(&xhci->lock); + spin_lock_irqsave(&xhci->lock, flags); /* Check if the xHC generated the interrupt, or the irq is shared */ status = readl(&xhci->op_regs->status); if (status == 0xffffffff) goto hw_died; if (!(status & STS_EINT)) { - spin_unlock(&xhci->lock); + spin_unlock_irqrestore(&xhci->lock, flags); return IRQ_NONE; } if (status & STS_FATAL) { xhci_warn(xhci, "WARNING: Host System Error\n"); xhci_halt(xhci); hw_died: - spin_unlock(&xhci->lock); + spin_unlock_irqrestore(&xhci->lock, flags); return IRQ_HANDLED; } @@ -2679,7 +2680,7 @@ hw_died: temp_64 = xhci_read_64(xhci, &xhci->ir_set->erst_dequeue); xhci_write_64(xhci, temp_64 | ERST_EHB, &xhci->ir_set->erst_dequeue); - spin_unlock(&xhci->lock); + spin_unlock_irqrestore(&xhci->lock, flags); return IRQ_HANDLED; } @@ -2707,7 +2708,7 @@ hw_died: temp_64 |= ERST_EHB; xhci_write_64(xhci, temp_64, &xhci->ir_set->erst_dequeue); - spin_unlock(&xhci->lock); + spin_unlock_irqrestore(&xhci->lock, flags); return IRQ_HANDLED; } -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html