On Tue, 2020-02-04 at 17:44 +0800, Mathias Nyman wrote: > On 1.2.2020 13.20, Macpaul Lin wrote: > > On Fri, 2020-01-31 at 16:50 +0200, Mathias Nyman wrote: > >> On 17.1.2020 9.41, Macpaul Lin wrote: > >>> According to NULL pointer fix: https://tinyurl.com/uqft5ra > >>> xhci: Fix NULL pointer dereference with xhci_irq() for shared_hcd > >>> The similar issue has also been found in QC activities in Mediatek. > >>> > >>> Here quote the description from the referenced patch as follows. > >>> "Commit ("f068090426ea xhci: Fix leaking USB3 shared_hcd > >>> at xhci removal") sets xhci_shared_hcd to NULL without > >>> stopping xhci host. This results into a race condition > >>> where shared_hcd (super speed roothub) related interrupts > >>> are being handled with xhci_irq happens when the > >>> xhci_plat_remove is called and shared_hcd is set to NULL. > >>> Fix this by setting the shared_hcd to NULL only after the > >>> controller is halted and no interrupts are generated." > >>> > >>> Signed-off-by: Sriharsha Allenki <sallenki@xxxxxxxxxxxxxx> > >>> Signed-off-by: Macpaul Lin <macpaul.lin@xxxxxxxxxxxx> > >>> --- > >>> drivers/usb/host/xhci-mtk.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c > >>> index b18a6baef204..c227c67f5dc5 100644 > >>> --- a/drivers/usb/host/xhci-mtk.c > >>> +++ b/drivers/usb/host/xhci-mtk.c > >>> @@ -593,11 +593,11 @@ static int xhci_mtk_remove(struct platform_device *dev) > >>> struct usb_hcd *shared_hcd = xhci->shared_hcd; > >>> > >>> usb_remove_hcd(shared_hcd); > >>> - xhci->shared_hcd = NULL; > >>> device_init_wakeup(&dev->dev, false); > >>> > >>> usb_remove_hcd(hcd); > >>> usb_put_hcd(shared_hcd); > >>> + xhci->shared_hcd = NULL; > >>> usb_put_hcd(hcd); > >>> xhci_mtk_sch_exit(mtk); > >>> xhci_mtk_clks_disable(mtk); > >>> > >> > >> Could you share details of the NULL pointer dereference, (backtrace). > > > > This bug was found by our QA staff while doing 500 times plug-in and > > plug-out devices. The backtrace I have was recorded by QA and I didn't > > reproduce this issue on my own environment. However, after applied this > > patch the issue seems resolve. Here is the backtrace: > > > > Exception Class: Kernel (KE) > > PC is at [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364 > > LR is at [<ffffff8008ccc788>] xhci_irq+0x2f0/0x2364 > > > > Current Executing Process: > > [iptables, 859][netdagent, 770] > > > > Backtrace: > > [<ffffff80080ead58>] __atomic_notifier_call_chain+0xa8/0x130 > > [<ffffff80080eb6d4>] notify_die+0x84/0xac > > [<ffffff800808e874>] die+0x1d8/0x3b8 > > [<ffffff80080a89b0>] __do_kernel_fault+0x178/0x188 > > [<ffffff80080a81b4>] do_page_fault+0x44/0x3b0 > > [<ffffff80080a811c>] do_translation_fault+0x44/0x98 > > [<ffffff8008080e08>] do_mem_abort+0x4c/0x128 > > [<ffffff80080832d0>] el1_da+0x24/0x3c > > [<ffffff8008cccbc0>] xhci_irq+0x728/0x2364 > > [<ffffff8008c98804>] usb_hcd_irq+0x2c/0x44 > > [<ffffff8008179bb0>] __handle_irq_event_percpu+0x26c/0x4a4 > > [<ffffff8008179ec8>] handle_irq_event+0x5c/0xd0 > > [<ffffff800817e3c0>] handle_fasteoi_irq+0x10c/0x1e0 > > [<ffffff80081787b0>] __handle_domain_irq+0x32c/0x738 > > [<ffffff800808159c>] gic_handle_irq+0x174/0x1c4 > > [<ffffff8008083cf8>] el0_irq_naked+0x50/0x5c > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > Thanks, > Could you help me find out which line of code xhci_irq+0x728 is in your case. > > As Guenter pointed out there is a risk of turning the NULL pointer dereference > into a use after free if we just solve this by setting xhci->shared_hcd = NULL > later. > > If you still have that kernel around, and xhci is compiled in: > gdb vmlinux > gdb li *(xhci_irq+0x728) > Sorry that I couldn't get back to you soon. The internal code version for this issue was really old and a little bit difficult to rewind to that version. However, I think the following dump might be correct for the code base. (gdb) li *(xhci_irq+0x728) 0xffffff8008cc8634 is in xhci_irq (*stripped* kernel-4.14/drivers/usb/host/xhci.h:1694). 1689 */ 1690 #define XHCI_MAX_REXIT_TIMEOUT_MS 20 1691 1692 static inline unsigned int hcd_index(struct usb_hcd *hcd) 1693 { 1694 if (hcd->speed >= HCD_USB3) 1695 return 0; 1696 else 1697 return 1; 1698 } (gdb) Thanks Macpaul Lin