Hi, On 7/16/2020 4:33 PM, Marek Szyprowski wrote: > Hi Minas, > > On 16.07.2020 14:19, Minas Harutyunyan wrote: >> On 7/16/2020 4:09 PM, Marek Szyprowski wrote: >>> When gadget registration fails, one should not call usb_del_gadget_udc(). >>> Ensure this by setting gadget->udc to NULL. Also in case of a failure >>> there is no need to disable low-level hardware, so return immiedetly >>> instead of jumping to error_init label. >>> >>> This fixes the following kernel NULL ptr dereference on gadget failure >>> (can be easily triggered with g_mass_storage without any module >>> parameters): >>> >>> dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter besl=1 >>> dwc2 12480000.hsotg: dwc2_check_params: Invalid parameter g_np_tx_fifo_size=1024 >>> dwc2 12480000.hsotg: EPs: 16, dedicated fifos, 7808 entries in SPRAM >>> Mass Storage Function, version: 2009/09/11 >>> LUN: removable file: (no medium) >>> no file given for LUN0 >>> g_mass_storage 12480000.hsotg: failed to start g_mass_storage: -22 >>> 8<--- cut here --- >>> Unable to handle kernel NULL pointer dereference at virtual address 00000104 >>> pgd = (ptrval) >>> [00000104] *pgd=00000000 >>> Internal error: Oops: 805 [#1] PREEMPT SMP ARM >>> Modules linked in: >>> CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.8.0-rc5 #3133 >>> Hardware name: Samsung Exynos (Flattened Device Tree) >>> Workqueue: events deferred_probe_work_func >>> PC is at usb_del_gadget_udc+0x38/0xc4 >>> LR is at __mutex_lock+0x31c/0xb18 >>> ... >>> Process kworker/0:1 (pid: 12, stack limit = 0x(ptrval)) >>> Stack: (0xef121db0 to 0xef122000) >>> ... >>> [<c076bf3c>] (usb_del_gadget_udc) from [<c0726bec>] (dwc2_hsotg_remove+0x10/0x20) >>> [<c0726bec>] (dwc2_hsotg_remove) from [<c0711208>] (dwc2_driver_probe+0x57c/0x69c) >>> [<c0711208>] (dwc2_driver_probe) from [<c06247c0>] (platform_drv_probe+0x6c/0xa4) >>> [<c06247c0>] (platform_drv_probe) from [<c0621df4>] (really_probe+0x200/0x48c) >>> [<c0621df4>] (really_probe) from [<c06221e8>] (driver_probe_device+0x78/0x1fc) >>> [<c06221e8>] (driver_probe_device) from [<c061fcd4>] (bus_for_each_drv+0x74/0xb8) >>> [<c061fcd4>] (bus_for_each_drv) from [<c0621b54>] (__device_attach+0xd4/0x16c) >>> [<c0621b54>] (__device_attach) from [<c0620c98>] (bus_probe_device+0x88/0x90) >>> [<c0620c98>] (bus_probe_device) from [<c06211b0>] (deferred_probe_work_func+0x3c/0xd0) >>> [<c06211b0>] (deferred_probe_work_func) from [<c0149280>] (process_one_work+0x234/0x7dc) >>> [<c0149280>] (process_one_work) from [<c014986c>] (worker_thread+0x44/0x51c) >>> [<c014986c>] (worker_thread) from [<c0150b1c>] (kthread+0x158/0x1a0) >>> [<c0150b1c>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20) >>> Exception stack(0xef121fb0 to 0xef121ff8) >>> ... >>> ---[ end trace 9724c2fc7cc9c982 ]--- >>> >>> While fixing this also fix the double call to dwc2_lowlevel_hw_disable() >>> if dr_mode is set to USB_DR_MODE_PERIPHERAL. In such case low-level >>> hardware is already disabled before calling usb_add_gadget_udc(). That >>> function correctly preserves low-level hardware state, there is no need >>> for the second unconditional dwc2_lowlevel_hw_disable() call. >>> >>> Fixes: 207324a321a8 ("usb: dwc2: Postponed gadget registration to the udc class driver") >>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> Acked-by: Minas Harutyunyan <hminas@xxxxxxxxxxxx> >>> --- >>> drivers/usb/dwc2/platform.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c >>> index cb8ddbd53718..db9fd4bd1a38 100644 >>> --- a/drivers/usb/dwc2/platform.c >>> +++ b/drivers/usb/dwc2/platform.c >>> @@ -582,6 +582,7 @@ static int dwc2_driver_probe(struct platform_device *dev) >>> if (hsotg->gadget_enabled) { >>> retval = usb_add_gadget_udc(hsotg->dev, &hsotg->gadget); >>> if (retval) { >>> + hsotg->gadget.udc = NULL; >> Consider your recently sent patch "[PATCH] usb: gadget: udc: Flush >> pending work also in error path", more probably it's not required, >> because root cause of observed dwc2 issue comes from udc. >> Am I wrong? > > They are two independent issues, both fatal on my test system. > > The first one (the lack of NULLing gadget->udc) can be easily triggered > on any system. The latter one (lack of flush in UDC core) depends on the > luck and memory layout on the test system. > > Best regards >