Re: [PATCH v2] xhci: Cleanup only when releasing primary hcd.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Joe,

On 27/04/16 23:41, Joe Lawrence wrote:
> Hello Mathias, Roger, Gabriel
> 
> I've been chasing strange MSI / legacy IRQ behavior from xHCI for a
> couple days and wanted to report a few things that may be effected by
> Gabriel's recent "xhci: Cleanup only when releasing primary hcd" patch
> (more on this at the bottom).
> 
> After 8c24d6d7b09d "usb: xhci: stop everything on the first call to
> xhci_stop" (v4.3+), there exists a device removal window that can later
> leak interrupts and other device resources ordinarily cleaned up by
> xhci_stop.
> 
> The quick summary is that if the xhci_handshake call from xhci_halt
> fails, it will set xhci->xhc_state |= XHCI_STATE_HALTED.  Later when the
> device is removed, xhci_stop keys off the xhc_state and will exit early
> if XHCI_STATE_HALTED is set.  
> 
> This manifested itself on my Stratus machine while running repeated
> device hotplug removals at various timing intervals during
> initialization (for example 3, 6 or 12 seconds after hotplug add).  I
> didn't grab a backtrace of xhci_halt, so I'm not exactly sure why it was
> called after xhci_run, but note that it is not only called from
> xhci_stop, but also xhci_handle_command_timeout and others.
> 
> The result was that a later device initialization couldn't allocate its
> MSI interrupts -- after much instrumentation I finally traced it back to
> the previous initialization / removal failing to free the MSI irqs it
> requested. 
> 
> Here's the relevent snippet from added trace logging.
> (ftmod_ioboard_event and ioboard_bringdown are functions from an
> out-of-tree PCI hotplug driver for Stratus hardware platforms.  These
> routines will call into PCI API to add or remove a set of PCI devices so
> I left them here to help mark device lifetimes.)
> 
> Last successful MSI allocation
> ==============================
> 
>             java-17858 [002] ....  3005.763238: ftmod_ioboard_event: adding bus 03
>             java-17858 [009] ....  3009.780359: usb_add_hcd: request_irq(irq=10, desc(ffff88085f41bba8), hcd(ffff8808518e53d8)) return=0
>             java-17858 [004] ....  3009.808979: usb_add_hcd: request_irq(irq=11, desc(ffff88085f41bef8), hcd(ffff880482d992a8)) return=0
>             java-17858 [008] ....  3009.828263: xhci_halt: xhci(ffff88083f5a8258)
>             java-17858 [008] ....  3009.828269: xhci_quiesce: xhci(ffff88083f5a8258) halted=1
>             java-17858 [008] ....  3009.828273: xhci_quiesce: xhci(ffff88083f5a8258) cmd=0
>             java-17858 [008] ....  3009.829614: xhci_setup_msix: xhci=ffff88083f5a8258
>             java-17858 [008] ....  3009.829627: xhci_setup_msix: xhci=ffff88083f5a8258 Failed to enable MSI-X, ret=-22
>             java-17858 [008] ....  3009.829635: xhci_run: xhci=ffff88083f5a8258
>             java-17858 [008] ....  3009.830069: xhci_run: request_irq(119, ...) MSI desc=ffff88083ac32e68 ret=0
>             java-17858 [007] ....  3010.009977: xhci_halt: xhci(ffff88083f5a8258)
> 
> xhci_halt is called, perhaps from
> xhci_handle_command_timeout?
> 
>             java-17858 [007] ....  3010.009985: xhci_quiesce: xhci(ffff88083f5a8258) halted=1
>             java-17858 [007] ....  3010.009986: xhci_quiesce: xhci(ffff88083f5a8258) cmd=ffffffff
> 
> cmd = ~0, so the hardware is gone!  subsequent
> call to xhci_handshake will detect this and return
> -ENODEV, xhci_halt will then set 
> xhci->xhc_state |= XHCI_STATE_HALTED
> 
>             java-17858 [007] d...  3010.010381: xhci_irq: hcd(ffff88083f5a8008)->irq=0 (depth=0, count=300, unhandled=0), status==~0
> 
> xhci_irq notices the missing hardware, too.
> 
>             java-17858 [007] ....  3010.014476: xhci_stop: xhci(ffff88083f5a8258)->xhc_state=2
> 
> xhci_stop is called and exits early because
> (xhci->xhc_state & XHCI_STATE_HALTED)
> Notice no entries for xhci_free_irq!
> 
>             java-17858 [007] ....  3010.029101: ftmod_ioboard_event: adding bus 79
>    kworker/u97:3-974   [009] ....  3013.264132: ioboard_bringdown: board 0
>    kworker/u97:3-974   [009] ....  3013.264145: ioboard_bringdown: sbus=0, removing bus 03
>    kworker/u97:3-974   [005] ....  3016.551113: ioboard_bringdown: sbus=1, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551119: ioboard_bringdown: sbus=2, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551119: ioboard_bringdown: sbus=3, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551119: ioboard_bringdown: sbus=4, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551120: ioboard_bringdown: sbus=5, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551120: ioboard_bringdown: sbus=6, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551120: ioboard_bringdown: sbus=7, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551120: ioboard_bringdown: sbus=8, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551120: ioboard_bringdown: sbus=9, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551124: ioboard_bringdown: sbus=10, pdev =           (null)
>    kworker/u97:3-974   [005] ....  3016.551125: ioboard_bringdown: sbus=11, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551126: ioboard_bringdown: sbus=12, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551126: ioboard_bringdown: sbus=13, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551126: ioboard_bringdown: sbus=14, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551127: ioboard_bringdown: sbus=15, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551127: ioboard_bringdown: sbus=16, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551127: ioboard_bringdown: sbus=17, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551127: ioboard_bringdown: sbus=18, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551127: ioboard_bringdown: sbus=19, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551128: ioboard_bringdown: sbus=20, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551128: ioboard_bringdown: sbus=21, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551128: ioboard_bringdown: sbus=22, bs->TopLevelBridge[sbus] = 0
>    kworker/u97:3-974   [005] ....  3016.551128: ioboard_bringdown: sbus=23, bs->TopLevelBridge[sbus] = 0
> 
> First failed MSI allocation
> ===========================
> 
>             java-17858 [011] ....  3084.238393: ftmod_ioboard_event: adding bus 03
>             java-17858 [005] ....  3088.071479: usb_add_hcd: request_irq(irq=10, desc(ffff88085f41bba8), hcd(ffff8804f5cc2548)) return=0
>             java-17858 [005] ....  3088.096260: usb_add_hcd: request_irq(irq=11, desc(ffff88085f41bef8), hcd(ffff8804f5cc6678)) return=0
>             java-17858 [000] .N..  3088.121585: xhci_halt: xhci(ffff880857730258)
>             java-17858 [000] .N..  3088.121593: xhci_quiesce: xhci(ffff880857730258) halted=1
>             java-17858 [000] .N..  3088.121597: xhci_quiesce: xhci(ffff880857730258) cmd=0
>             java-17858 [000] ....  3088.123387: xhci_setup_msix: xhci=ffff880857730258
>             java-17858 [000] ....  3088.123400: xhci_setup_msix: xhci=ffff880857730258 Failed to enable MSI-X, ret=-22
>             java-17858 [000] ....  3088.123408: xhci_run: xhci=ffff880857730258
>             java-17858 [000] ....  3088.123486: msi_domain_alloc_irqs: virq = -17
> 
> #define EEXIST          17      /* File exists */
> 
>             java-17858 [000] ....  3088.123495: pci_enable_msi_range: pci_msi_setup_msi_irqs ret=-28
>             java-17858 [000] ....  3088.123497: xhci_run: xhci=ffff880857730258 failed to allocate MSI entry, ret=-28
> 
> #define ENOSPC          28      /* No space left on device */
> 
>             java-17858 [000] ....  3088.123548: xhci_run: request_irq(10, ...) legacy desc=ffff88085f41bba8 ret=0
>           <idle>-0     [000] d.H.  3088.388279: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=1, unhandled=0), status=0 (!STS_EINT)
>           <idle>-0     [000] d.H.  3088.396057: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=2, unhandled=0), status=0 (!STS_EINT)
>           <idle>-0     [000] d.H.  3088.396424: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=3, unhandled=0), status=0 (!STS_EINT)
>      ksoftirqd/0-3     [000] d.H.  3088.491807: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=4, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.H.  3088.495645: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=5, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.H.  3088.496136: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=6, unhandled=0), status=0 (!STS_EINT)
>           <idle>-0     [000] dNH.  3088.508039: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=8, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.H.  3088.511258: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=9, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.H.  3088.515058: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=10, unhandled=0), status=0 (!STS_EINT)
>      ksoftirqd/0-3     [000] d.h.  3088.528961: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=11, unhandled=0), status=0 (!STS_EINT)
>      ksoftirqd/0-3     [000] d.H.  3088.529620: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=12, unhandled=0), status=0 (!STS_EINT)
>      ksoftirqd/0-3     [000] d.H.  3088.532033: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=13, unhandled=0), status=0 (!STS_EINT)
>      kworker/0:2-28078 [000] d.H.  3088.538757: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=14, unhandled=0), status=0 (!STS_EINT)
>      kworker/0:2-28078 [000] d.H.  3088.539227: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=15, unhandled=0), status=0 (!STS_EINT)
>      kworker/0:2-28078 [000] dNH.  3088.542054: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=16, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.h.  3088.548702: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=17, unhandled=0), status=0 (!STS_EINT)
>     kworker/0:1H-984   [000] d.h.  3088.549322: xhci_irq: hcd(ffff880857730008)->irq=10 (depth=0, count=18, unhandled=0), status=0 (!STS_EINT)
> 
> This continues until the device is removed later.
> 
> I'm currently testing with reverting 8c24d6d7b09d "usb: xhci: stop
> everything on the first call to xhci_stop" to see how that fares.
> 
> In the meantime I was browsing recent linux-scsi archives and noticed
> Gabriel's [PATCH v2] xhci: Cleanup only when releasing primary hcd:
> 
>   http://marc.info/?l=linux-usb&m=146134802313366&w=2
> 
> which looks like it modifies xhci_stop() to continue on to
> xhci_cleanup_msix even if XHCI_STATE_HALTED (for primary host
> controller's anyway).
> 
> I think this may avoid the issue that I've been seeing and was wondering
> if it would be safe to test this patch on top of 4.6.0-rc3?  (An older
> RC because my repro is very timing dependent, so moving the kernel
> forward / backwards or adding excessive trace logging can make it hard
> to debug.)

Yes, please test it on top of v4.6-rc and I'm hoping it will
be sent for stable as well.

cheers,
-roger
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux