Re: Oops in xhci_endpoint_reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30.7.2019 18.49, Enric Balletbo Serra wrote:
Hi,

Missatge de Bob Gleitsmann <rjgleits@xxxxxxxxxxxxx> del dia ds., 27 de
jul. 2019 a les 23:39:

OK, here's the result of the bisection:

ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
commit ef513be0a9057cc6baf5d29566aaaefa214ba344
Author: Jim Lin <jilin@xxxxxxxxxx>
Date:???? Mon Jun 3 18:53:44 2019 +0800

?????? usb: xhci: Add Clear_TT_Buffer

I want to confirm that I get the same oops on a Samsung Chromebook
Plus (rk3399) and that reverting the above commit fixes the issue.

If it helps there is a decoded stacktrace below (I need to gain some
usb knowledge to deal with this), probably others can have a better
idea on what is happening.

[   75.613254] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000030

[   75.769594] pc : xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)

Thanks, guessing maybe host_ep->hcpriv used to be cleared after some endpoint was dropped,
which in normal cases would cause xhci_endpoint_reset() to return early.

3074	static void xhci_endpoint_reset(struct usb_hcd *hcd,
3075			struct usb_host_endpoint *host_ep)
3076	{
3077		struct xhci_hcd *xhci;
3078		struct usb_device *udev;
3079		struct xhci_virt_device *vdev;
3080		struct xhci_virt_ep *ep;
3081		struct xhci_input_control_ctx *ctrl_ctx;
3082		struct xhci_command *stop_cmd, *cfg_cmd;
3083		unsigned int ep_index;
3084		unsigned long flags;
3085		u32 ep_flag;
3086	
3087		xhci = hcd_to_xhci(hcd);
3088		if (!host_ep->hcpriv)
3089			return;
3090		udev = (struct usb_device *) host_ep->hcpriv;
3091		vdev = xhci->devs[udev->slot_id];
3092		ep_index = xhci_get_endpoint_index(&host_ep->desc);
3093		ep = &vdev->eps[ep_index];
3094	
3095		/* Bail out if toggle is already being cleared by a endpoint reset */
3096		if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {

commit ef513be" usb: xhci: Add Clear_TT_Buffer" sets hcpriv again when handling a halted endpoint behind a TT hub.
If the event to handle the stalled endpoint is hadled late its possible we set a stale value to ep->hcpriv
which should just be cleared.

+static void xhci_clear_hub_tt_buffer(struct xhci_hcd *xhci, struct xhci_td *td,
+               struct xhci_virt_ep *ep)
+{
+       /*
+        * As part of low/full-speed endpoint-halt processing
+        * we must clear the TT buffer (USB 2.0 specification 11.17.5).
+        */
+       if (td->urb->dev->tt && !usb_pipeint(td->urb->pipe) &&
+           (td->urb->dev->tt->hub != xhci_to_hcd(xhci)->self.root_hub) &&
+           !(ep->ep_state & EP_CLEARING_TT)) {
+               ep->ep_state |= EP_CLEARING_TT;
+               td->urb->ep->hcpriv = td->urb->dev;
+               if (usb_hub_clear_tt_buffer(td->urb))
+                       ep->ep_state &= ~EP_CLEARING_TT;
+       }
+}

Still just a guess.
Does the below code fix your issue?

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 248cd7a..a0984aa 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3092,6 +3092,10 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
        ep_index = xhci_get_endpoint_index(&host_ep->desc);
        ep = &vdev->eps[ep_index];
+ if (!ep) {
+               xhci_err(xhci, "Mathias: No ep for endpoint reset, bail out\n");
+               return;
+       }
        /* Bail out if toggle is already being cleared by a endpoint reset */
        if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
                ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;


Also logs and traces would better show the root cause:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< suspend/resume >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

-Mathias


[   75.774741] lr : xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.h:1913
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3087)
[   75.779797] sp : ffff000011b6b930
[   75.783494] x29: ffff000011b6b930 x28: 00000000ffffff95
[   75.789426] x27: ffff8000ef657e00 x26: 0000000000000000
[   75.795358] x25: ffff8000efafeb80 x24: 0000000000000000
[   75.801289] x23: ffff8000efa4a250 x22: 0000000000000001
[   75.807212] x21: ffff8000efafe800 x20: ffff8000efa4a000
[   75.813143] x19: ffff8000efafe850 x18: 0000000000000000
[   75.819074] x17: 0000000000000000 x16: 0000000000000000
[   75.824997] x15: 0000000000000000 x14: 0000000000000000
[   75.830920] x13: ffff8000ef5ff180 x12: 0000000034d4d91d
[   75.836851] x11: 0000000000000000 x10: 0000000000000990
[   75.842773] x9 : ffff8000efa3d000 x8 : 0000000000000004
[   75.848695] x7 : ffff8000f55b8340 x6 : ffff8000ef65e700
[   75.854618] x5 : ffff8000efe844c0 x4 : 0000000000000000
[   75.860549] x3 : 0000000000000000 x2 : 0000000000000000
[   75.866471] x1 : 0000000000000000 x0 : 0000000000000000
[   75.872394] Call trace:
[   75.875122] xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)
[   75.879889] usb_hcd_reset_endpoint
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hcd.c:2090)
[   75.884753] usb_enable_endpoint
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/message.c:1294)
[   75.889324] usb_ep0_reinit
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:4423)
[   75.893402] usb_reset_and_verify_device
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:5716)
[   75.898848] usb_port_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3379
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3579)
[   75.903217] generic_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/generic.c:277)
[   75.907304] usb_resume_both
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1182
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1406)
[   75.911584] usb_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1501)
[   75.915281] usb_dev_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/usb.c:471)
[   75.919361] dpm_run_callback.isra.6
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:458)
[   75.924322] device_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:999)
[   75.928408] dpm_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1055)
[   75.932203] dpm_resume_end
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1171)

Thanks,
~ Enric

??????
?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
?????? processing for full-/low-speed endpoints connected via a TT, the host
?????? software must use the Clear_TT_Buffer request to the TT to ensure
?????? that the buffer is not in the busy state".
??????
?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
?????? request we may continue to get STALL with the folllowing requests,
?????? like Set_Interface.
??????
?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
?????? request to the hub of the device for the following Set_Interface
?????? requests to the device to get ACK successfully.
??????
?????? Signed-off-by: Jim Lin <jilin@xxxxxxxxxx>
?????? Acked-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
?????? Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
??drivers/usb/host/xhci.h?????????? |?? 5 +++++
??3 files changed, 52 insertions(+), 1 deletion(-)


On 7/27/19 6:59 AM, Greg KH wrote:
On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
Hello,


I have seen kernel oopses on waking from suspend to memory. I got this
twice, one dmesg with backtrace attached. The other one had the failure
in the same place in the code.


This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
driver. Have not had the problem with earlier kernels. Using Gentoo
linux, amd64, but git kernel.
Any chance you can run 'git bisect' to track down the offending commit?

thanks,

greg k-h





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux