Re: [RFT PATCH] xhci: Fix memory leak when caching protocol extended capability PSI tables

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Tue, 11 Feb 2020 04:23:16 -0800

On Tue, Feb 11, 2020 at 11:56:12AM +0100, Marek Szyprowski wrote:
> Hi
> 
> On 08.01.2020 16:17, Mathias Nyman wrote:
> > xhci driver assumed that xHC controllers have at most one custom
> > supported speed table (PSI) for all usb 3.x ports.
> > Memory was allocated for one PSI table under the xhci hub structure.
> >
> > Turns out this is not the case, some controllers have a separate
> > "supported protocol capability" entry with a PSI table for each port.
> > This means each usb3 port can in theory support different custom speeds.
> >
> > To solve this cache all supported protocol capabilities with their PSI
> > tables in an array, and add pointers to the xhci port structure so that
> > every port points to its capability entry in the array.
> >
> > When creating the SuperSpeedPlus USB Device Capability BOS descriptor
> > for the xhci USB 3.1 roothub we for now will use only data from the
> > first USB 3.1 capable protocol capability entry in the array.
> > This could be improved later, this patch focuses resolving
> > the memory leak.
> >
> > Reported-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> > Reported-by: Sajja Venkateswara Rao <VenkateswaraRao.Sajja@xxxxxxx>
> > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
> 
> This patch landed in today's linux-next (20200211) and causes NULL 
> pointer dereference during second suspend/resume cycle on Samsung 
> Exynos5422-based (arm 32bit) Odroid XU3lite board:
> 
> # time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:51:43 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.012 seconds
> Freezing user space processes ... (elapsed 0.010 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
> smsc95xx 1-1.1:1.0 eth0: entering SUSPEND2 mode
> wake enabled for irq 153
> wake enabled for irq 158
> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
> mask: 0xffffffe7
> Disabling non-boot CPUs ...
> IRQ 51: no longer affine to CPU1
> IRQ 52: no longer affine to CPU2
> s3c2410-wdt 101d0000.watchdog: watchdog disabled
> wake disabled for irq 158
> usb usb1: root hub lost power or was reset
> usb usb2: root hub lost power or was reset
> wake disabled for irq 153
> exynos-tmu 10060000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10060000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10064000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10064000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 10068000.tmu: More trip points than supported by this TMU.
> exynos-tmu 10068000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 1006c000.tmu: More trip points than supported by this TMU.
> exynos-tmu 1006c000.tmu: 2 trip points should be configured in polling mode.
> exynos-tmu 100a0000.tmu: More trip points than supported by this TMU.
> exynos-tmu 100a0000.tmu: 6 trip points should be configured in polling mode.
> usb usb3: root hub lost power or was reset
> s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
> usb usb4: root hub lost power or was reset
> xhci-hcd xhci-hcd.8.auto: No ports on the roothubs?
> PM: dpm_run_callback(): platform_pm_resume+0x0/0x44 returns -12
> PM: Device xhci-hcd.8.auto failed to resume async: error -12
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> usb 1-1: reset high-speed USB device number 2 using exynos-ehci
> usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
> OOM killer enabled.
> Restarting tasks ... done.
> 
> real    0m11.890s
> user    0m0.001s
> sys     0m0.679s
> root@target:~# PM: suspend exit
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, 
> actual 396825HZ div = 63)
> mmc_host mmc0: Bus speed (slot 0) = 200000000Hz (slot req 200000000Hz, 
> actual 200000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, 
> actual 50000000HZ div = 0)
> mmc_host mmc0: Bus speed (slot 0) = 400000000Hz (slot req 200000000Hz, 
> actual 200000000HZ div = 1)
> smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
> 
> root@target:~#
> root@target:~# time rtcwake -s10 -mmem[   35.451572] vdd_ldo12: disabling
> 
> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Feb 11 10:52:02 2020
> PM: suspend entry (deep)
> Filesystems sync: 0.004 seconds
> Freezing user space processes ... (elapsed 0.006 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.070 seconds) done.
> hub 4-0:1.0: hub_ext_port_status failed (err = -32)
> hub 3-0:1.0: hub_ext_port_status failed (err = -32)
> 8<--- cut here ---
> Unable to handle kernel NULL pointer dereference at virtual address 00000014
> pgd = 4c26b54b
> [00000014] *pgd=00000000
> Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> Modules linked in:
> CPU: 3 PID: 1468 Comm: kworker/u16:23 Not tainted 
> 5.6.0-rc1-next-20200211 #268
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> PC is at xhci_suspend+0x12c/0x520
> LR is at 0xa6aa9898
> pc : [<c0724c90>]    lr : [<a6aa9898>]    psr: 60000093
> sp : ec401df8  ip : 0000001a  fp : c12e7864
> r10: 00000000  r9 : ecfb87b0  r8 : ecfb8220
> r7 : 00000000  r6 : 00000000  r5 : 00000004  r4 : ecfb81f0
> r3 : 00007d00  r2 : 00000001  r1 : 00000001  r0 : 00000000
> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 6bd4006a  DAC: 00000051
> Process kworker/u16:23 (pid: 1468, stack limit = 0x6e4b6fba)
> Stack: (0xec401df8 to 0xec402000)
> ...
> [<c0724c90>] (xhci_suspend) from [<c061b4f4>] (dpm_run_callback+0xb4/0x3fc)
> [<c061b4f4>] (dpm_run_callback) from [<c061bd5c>] 
> (__device_suspend+0x134/0x7e8)
> [<c061bd5c>] (__device_suspend) from [<c061c42c>] (async_suspend+0x1c/0x94)
> [<c061c42c>] (async_suspend) from [<c0154bd0>] 
> (async_run_entry_fn+0x48/0x1b8)
> [<c0154bd0>] (async_run_entry_fn) from [<c0149b38>] 
> (process_one_work+0x230/0x7bc)
> [<c0149b38>] (process_one_work) from [<c014a108>] (worker_thread+0x44/0x524)
> [<c014a108>] (worker_thread) from [<c01511fc>] (kthread+0x130/0x164)
> [<c01511fc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xec401fb0 to 0xec401ff8)
> ...
> ---[ end trace c72caf6487666442 ]---
> note: kworker/u16:23[1468] exited with preempt_count 1
> 
> Reverting it fixes the NULL pointer issue. I can provide more 
> information or do some other tests. Just let me know what will help to 
> fix it.
> 
>  > ...

Ugh.  Mathias, should I just revert this for now?

thanks,

greg k-h