Re: Bug 153551: Kernel panic on Nexus 5X USB unplug while tethering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 08/23/2016 06:36 AM, Mathias Nyman wrote:
On 23.08.2016 14:26, Mathias Nyman wrote:
On 23.08.2016 13:54, Mathias Nyman wrote:
On 23.08.2016 02:21, Jose Marino wrote:
I'm using my phone (Nexus 5X running Android) to tether a USB
connection to my laptop (XPS 15 9550). I plug the phone through the
USB-C connection and in the phone I  select USB tethering. Initially
things look normal: a usb0 network interface appears in the laptop
and it tries to get an IP with dhcp. However, I observe two
different behaviors depending on whether it's a fresh boot, or I
have suspend/resumed the laptop. In a fresh boot everything works
fine, I get an IP and the connection works as expected. If I unplug
the phone, everything also works as expected.

However, after a suspend/resume cycle, I plug the phone in but the
laptop never connects to it. The usb0 interface still appears, but
the dhcp daemon is unable to get any response and finally times out.
The fun part happens when I unplug the phone. I consistently get a
kernel panic.

...
Anyways, I'll look at that panic in more detail as well


<6>[  178.693631] xhci_hcd 0000:3e:00.0: USB bus 4 deregistered
<6>[  178.693642] xhci_hcd 0000:3e:00.0: remove, state 1
<6>[  178.693648] usb usb3: USB disconnect, device number 1
<4>[  183.634994] xhci_hcd 0000:3e:00.0: xHCI host not responding to
stop endpoint command.
<4>[  183.635001] xhci_hcd 0000:3e:00.0: Assuming host is dying,
halting host.
<4>[  183.635019] xhci_hcd 0000:3e:00.0: Host not halted after 16000
microseconds.
<4>[  183.635022] xhci_hcd 0000:3e:00.0: Non-responsive xHCI host is
not halting.
<4>[  183.635025] xhci_hcd 0000:3e:00.0: Completing active URBs anyway.
<1>[  183.635116] BUG: unable to handle kernel NULL pointer
dereference at           (null)
<1>[  183.635402] IP: [<ffffffffa006d196>] usb_hc_died+0x16/0xc0
[usbcore]


Looks like the 5 second command timeout timer for stop endpoint
commands causes this.
the timer (stop_cmd_timer) will call
xhci_stop_endpoint_command_watchdog()  which calls
   usb_hc_died(xhci_to_hcd(xhci)->primary_hcd)

but hcd are probably freed and pointers set to null already  -> NULL
pointer dereference.

The timer should be synchronously deleted when the device is freed,
unless xhci_free_dev()
returns early.

So either hub_free_dev() is not called for this device at hcd removal,
or xhci_free_dev returns early.


Or then this happens:
(I'll call the hcds usb2_hcd and usb3_hcd to keep track of them,
usb2_hcd is the primary_hcd)

to begin with:
usb2_hcd->primary_hcd = usb2_hcd
usb2_hcd->shared_hcd  = usb3_hcd

usb3_hcd->primary_hcd = usb2_hcd
usb3_hcd->shared_hcd  = usb2_hcd


usb3_host is removed first:
xhci_pci_remove(struct pci_dev *dev)
  usb_remove_hcd(xhci->shared_hcd);  // remove usb3_hcd
  usb_put_hcd(xhci->shared_hcd)
    hcd_release(..)
      if (hcd->shared_hcd) {        //true
                struct usb_hcd *peer = hcd->shared_hcd;           //peer
is now usb2_hcd
                peer->shared_hcd = NULL;     //sets usb2_hcd->shared_hcd
to NULL
                peer->primary_hcd = NULL;    // sets
usb2_hcd->primary_hcd to NULL.  Why do we do this??
stop_cmd_timer triggers before the usb2_hcd is removed:
-> xhci_stop_endpoint_command_watchdog()
     usb_hc_died(xhci_to_hcd(xhci)->primary_hcd)  // xhci_to_hcd will
get usb2_hcd, usb2_hcd->primary_hcd is set to NULL here.


does something like this help?

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index fd9fd12..797137e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -850,6 +850,10 @@ void xhci_stop_endpoint_command_watchdog(unsigned
long arg)
        spin_lock_irqsave(&xhci->lock, flags);

        ep->stop_cmds_pending--;
+       if (xhci->xhc_state & XHCI_STATE_REMOVING) {
+               spin_unlock_irqrestore(&xhci->lock, flags);
+               return;
+       }
        if (xhci->xhc_state & XHCI_STATE_DYING) {
                xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
                                "Stop EP timer ran, but another timer
marked "
@@ -903,7 +907,7 @@ void xhci_stop_endpoint_command_watchdog(unsigned
long arg)
        spin_unlock_irqrestore(&xhci->lock, flags);
        xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
                        "Calling usb_hc_died()");
-       usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
+       usb_hc_died(xhci_to_hcd(xhci));
        xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
                        "xHCI host controller is dead.");
 }


The patch did not apply on top of 4.7.2. I applied this patch instead, which I hope is equivalent:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d7d5025..20b1b18 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -840,6 +840,10 @@ void xhci_stop_endpoint_command_watchdog(unsigned long arg)
 	spin_lock_irqsave(&xhci->lock, flags);

 	ep->stop_cmds_pending--;
+	if (xhci->xhc_state & XHCI_STATE_REMOVING) {
+		spin_unlock_irqrestore(&xhci->lock, flags);
+		return;
+	}
 	if (xhci->xhc_state & XHCI_STATE_DYING) {
 		xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
 				"Stop EP timer ran, but another timer marked "
@@ -893,7 +897,7 @@ void xhci_stop_endpoint_command_watchdog(unsigned long arg)
 	spin_unlock_irqrestore(&xhci->lock, flags);
 	xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
 			"Calling usb_hc_died()");
-	usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
+	usb_hc_died(xhci_to_hcd(xhci));
 	xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
 			"xHCI host controller is dead.");
 }


So, I apply the patch, reboot, suspend/resume, plug in phone and tell it to tether. The dhcp client is still unable to communicate and times out. However, the patch seems to have avoided the NULL dereference. The computer did not panic although my X session stopped responding. I went to virtual console and recorded a dmesg (find attached).

Attachment: dmesg-Nyman-patch.log.gz
Description: application/gzip


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux