Sorry for the late reply. On Fri, Feb 09, 2024, Marek Szyprowski wrote: > On 08.02.2024 23:54, Thinh Nguyen wrote: > > On Wed, Feb 07, 2024, Marek Szyprowski wrote: > >> On 19.01.2024 10:48, Uttkarsh Aggarwal wrote: > >>> In current scenario if Plug-out and Plug-In performed continuously > >>> there could be a chance while checking for dwc->gadget_driver in > >>> dwc3_gadget_suspend, a NULL pointer dereference may occur. > >>> > >>> Call Stack: > >>> > >>> CPU1: CPU2: > >>> gadget_unbind_driver dwc3_suspend_common > >>> dwc3_gadget_stop dwc3_gadget_suspend > >>> dwc3_disconnect_gadget > >>> > >>> CPU1 basically clears the variable and CPU2 checks the variable. > >>> Consider CPU1 is running and right before gadget_driver is cleared > >>> and in parallel CPU2 executes dwc3_gadget_suspend where it finds > >>> dwc->gadget_driver which is not NULL and resumes execution and then > >>> CPU1 completes execution. CPU2 executes dwc3_disconnect_gadget where > >>> it checks dwc->gadget_driver is already NULL because of which the > >>> NULL pointer deference occur. > >>> > >>> Cc: <stable@xxxxxxxxxxxxxxx> > >>> Fixes: 9772b47a4c29 ("usb: dwc3: gadget: Fix suspend/resume during device mode") > >>> Acked-by: Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> > >>> Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@xxxxxxxxxxx> > >> This patch landed some time ago in linux-next as commit 61a348857e86 > >> ("usb: dwc3: gadget: Fix NULL pointer dereference in > >> dwc3_gadget_suspend"). Recently I found that it causes the following > >> warning when no USB gadget is bound to the DWC3 driver and a system > >> suspend/resume cycle is performed: > >> > >> dwc3 12400000.usb: wait for SETUP phase timed out > >> dwc3 12400000.usb: failed to set STALL on ep0out > >> ------------[ cut here ]------------ > >> WARNING: CPU: 4 PID: 604 at drivers/usb/dwc3/ep0.c:289 > >> dwc3_ep0_out_start+0xc8/0xcc > >> Modules linked in: > >> CPU: 4 PID: 604 Comm: rtcwake Not tainted 6.8.0-rc3-next-20240207 #7979 > >> Hardware name: Samsung Exynos (Flattened Device Tree) > >> unwind_backtrace from show_stack+0x10/0x14 > >> show_stack from dump_stack_lvl+0x58/0x70 > >> dump_stack_lvl from __warn+0x7c/0x1bc > >> __warn from warn_slowpath_fmt+0x1a0/0x1a8 > >> warn_slowpath_fmt from dwc3_ep0_out_start+0xc8/0xcc > >> dwc3_ep0_out_start from dwc3_gadget_soft_disconnect+0x16c/0x230 > >> dwc3_gadget_soft_disconnect from dwc3_gadget_suspend+0xc/0x90 > >> dwc3_gadget_suspend from dwc3_suspend_common+0x44/0x30c > >> dwc3_suspend_common from dwc3_suspend+0x14/0x2c > >> dwc3_suspend from dpm_run_callback+0x94/0x288 > >> dpm_run_callback from device_suspend+0x130/0x6d0 > >> device_suspend from dpm_suspend+0x124/0x35c > >> dpm_suspend from dpm_suspend_start+0x64/0x6c > >> dpm_suspend_start from suspend_devices_and_enter+0x134/0xbd8 > >> suspend_devices_and_enter from pm_suspend+0x2ec/0x380 > >> pm_suspend from state_store+0x68/0xc8 > >> state_store from kernfs_fop_write_iter+0x110/0x1d4 > >> kernfs_fop_write_iter from vfs_write+0x2e8/0x430 > >> vfs_write from ksys_write+0x5c/0xd4 > >> ksys_write from ret_fast_syscall+0x0/0x1c > >> Exception stack(0xf1421fa8 to 0xf1421ff0) > >> ... > >> irq event stamp: 14304 > >> hardirqs last enabled at (14303): [<c01a599c>] console_unlock+0x108/0x114 > >> hardirqs last disabled at (14304): [<c0c229d8>] > >> _raw_spin_lock_irqsave+0x64/0x68 > >> softirqs last enabled at (13030): [<c010163c>] __do_softirq+0x318/0x4f4 > >> softirqs last disabled at (13025): [<c012dd40>] __irq_exit_rcu+0x130/0x184 > >> ---[ end trace 0000000000000000 ]--- > >> > >> IMHO dwc3_gadget_soft_disconnect() requires some kind of a check if > >> dwc->gadget_driver is present or not, as it really makes no sense to do > > I don't think checking that is sufficient, and I don't think that's the > > case here. > > > >> any ep0 related operations if there is no gadget driver at all. > >> > > If there's indeed no gadget_driver present, then we wouldn't get this > > stack trace. (ie. dwc3_ep0_out_start should occurs when gadget_driver is > > present). This is a race happened between binding + suspend. > > I have no gadget compiled into the kernel and no such created via > configfs, so how can this be caused by a race? Ah... In that case, we got through the incomplete/wrong check for dwc3_gadget_soft_disconnect(): if (dwc->ep0state != EP0_SETUP_PHASE) Since there's no gadget driver, the controller never started and the ep0state is defaulted to EP0_UNCONNECTED, which explained why it got into the timeout condition above and incorrectly attempt to start the control transfer. > > > > > I think something like this should be sufficient. Would you mind giving > > it a try? > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index 564976b3e2b9..1990d6371066 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -2656,6 +2656,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > int ret; > > > > spin_lock_irqsave(&dwc->lock, flags); > > + if (!dwc->pullups_connected) { > > + spin_unlock_irqrestore(&dwc->lock, flags); > > + return 0; > > + } > > + > > dwc->connected = false; > > > > /* > > > This patch fixes the reported issue. Feel free to add: > > Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> > Thanks for the report and Tested-by! I'll send a fix patch soon. BR, Thinh