Hi Peter, On Fri, Sep 21, 2018 at 09:48:43AM +0800, Peter Chen wrote: > Type-C-to-A cable, and the USB3 HCD has already been NULL at that time. > The oops log like below: > > [681.782288] xhci-hcd xhci-hcd.1.auto: remove, state 1 > [681.787490] usb usb4: USB disconnect, device number 1 > [681.792808] usb 4-1: USB disconnect, device number 2 > [681.818089] xhci-hcd xhci-hcd.1.auto: USB bus 4 deregistered > [681.823803] Unable to handle kernel NULL pointer dereference at virtual address 000000a0 > [681.823806] Mem abort info: > [681.823809] Exception class = DABT (current EL), IL = 32 bits > [681.823811] SET = 0, FnV = 0 > [681.823813] EA = 0, S1PTW = 0 > [681.823814] Data abort info: > [681.823816] ISV = 0, ISS = 0x00000004 > [681.823818] CM = 0, WnR = 0 > [681.823822] user pgtable: 4k pages, 48-bit VAs, pgd = ffff8000ae3fd000 > [681.823824] [00000000000000a0] *pgd=0000000000000000 > [681.823829] Internal error: Oops: 96000004 [#1] PREEMPT SMP > [681.823832] Modules linked in: 8021q garp stp mrp crc32_ce qca6174(O) crct10dif_ce galcore(O) > [681.823849] CPU: 0 PID: 94 Comm: kworker/0:1 Tainted: G O 4.14.62-imx_4.14.y+gcd63def #1 > [681.823851] Hardware name: Freescale i.MX8MQ EVK (DT) > [681.823862] Workqueue: events_freezable __dwc3_set_mode > [681.823865] task: ffff8000b8a18000 task.stack: ffff00000a010000 > [681.823872] PC is at xhci_irq+0x5fc/0x14b8 > [681.823875] LR is at xhci_irq+0x3c/0x14b8 <snip> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index f0a99aa0ac58..2dc5176b79d0 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -2680,7 +2680,8 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd) > } > > if (xhci->xhc_state & XHCI_STATE_DYING || > - xhci->xhc_state & XHCI_STATE_HALTED) { > + xhci->xhc_state & XHCI_STATE_HALTED || > + xhci->xhc_state & XHCI_STATE_REMOVING) { > xhci_dbg(xhci, "xHCI dying, ignoring interrupt. " > "Shouldn't IRQs be disabled?\n"); > /* Clear the event handler busy flag (RW1C); We also noticed the same crash as you found, and tried to fix it in a similar way, but noticed that this still allows a memory leak to happen. It seems from commit fe190ed0d602a ("xhci: Do not halt the host until both HCD have disconnected their devices.") this was added to xhci_stop(), and is the reason we encounter the NULL pointer in xhci_irq() when it tries to access xhci->shared_hcd. + /* Only halt host and free memory after both hcds are removed */ if (!usb_hcd_is_primary_hcd(hcd)) { + /* usb core will free this hcd shortly, unset pointer */ + xhci->shared_hcd = NULL; mutex_unlock(&xhci->mutex); return; } While your fix will simply abort the xhci_irq() function if it encounters XHCI_STATE_REMOVING, during xhci_plat_remove(): static int xhci_plat_remove(struct platform_device *dev) { struct usb_hcd *hcd = platform_get_drvdata(dev); struct xhci_hcd *xhci = hcd_to_xhci(hcd); struct clk *clk = xhci->clk; struct clk *reg_clk = xhci->reg_clk; xhci->xhc_state |= XHCI_STATE_REMOVING; usb_remove_hcd(xhci->shared_hcd); ^^^^^^^^^ calls xhci_stop() and sets shared_hcd=NULL usb_phy_shutdown(hcd->usb_phy); usb_remove_hcd(hcd); usb_put_hcd(xhci->shared_hcd); ^^^^^^^^^^^ shared_hcd==NULL, so this is a no-op Since usb_put_hcd() doesn't get called for shared_hcd, we end up with one additional kref count and hence a leak. Wondering if we need to also remove the xhci->shared_hcd = NULL from xhci_stop(), in addition to your patches. Thoughts? Thanks, Jack -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project