On Wed, 27 Feb 2019 guido@xxxxxxxxxxxxxxxxxx wrote: > Zitat von Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>: > > > On Tue, 19 Feb 2019 guido@xxxxxxxxxxxxxxxxxx wrote: > > > >> Hi Alan, > >> > >> My last proposal "udc: net2280: Fix overrun of OUT messages" is still > >> under investigation. > >> > >> During the random stress tests I found a new rare problem: > >> > >> When a request must be dequeued with net2280_dequeue() e.g. due > >> to a device clear action and the same request is finished by the > >> function scan_dma_completions(): > >> > >> https://elixir.bootlin.com/linux/v5.0-rc7/source/drivers/usb/gadget/udc/net2280.c#L1269 > >> > >> then the following search loop does not find the request and the function > >> returns the error -EINVAL without restoring the state ep->stopped = stopped! > >> > >> https://elixir.bootlin.com/linux/v5.0-rc7/source/drivers/usb/gadget/udc/net2280.c#L1269 > >> > >> Thus the endpoint keeps blocked and does not receive any data. > >> When we insert ep->stopped = stopped here: > >> > >> if (&req->req != _req) { > >> ep->stopped = stopped; // <<<<<---- > >> spin_unlock_irqrestore(&ep->dev->lock, flags); > >> dev_err(&ep->dev->pdev->dev, "%s: Request mismatch\n", > >> __func__); > >> return -EINVAL; > >> } > >> > >> then the driver continues as expected, although the driver issues the > >> false error message "Request mismatch". > >> > >> What do you think? It's labor-intensive to fix the error message here, > >> or shall we leave it as it is, and only insert the "ep->stopped = stopped;" > > > > Yes, this is a bug. I think the dev_err() should be changed to > > ep_dbg(). It's not a real error, after all -- it's just a race. > > I agree. ep_dbg() is better here. I see the same bug in the sibling driver > net2272_dequeue(): > https://elixir.bootlin.com/linux/v5.0-rc8/source/drivers/usb/gadget/udc/net2272.c#L948 > > The bug is quite obvious to me, however I'm not aware that we use this driver > in any of our instruments. I cannot test it. What do you recommend? > - Shall I fix it within the same commit. > - Shall I fix it with another commit. > - Don't care about it You can fix it in a separate commit, and in the commit log say that it was only compile-tested because you don't have the right hardware for a full test. > > Do that and insert the "ep->stopped = stopped;" line. Also, consider > > changing the -EINVAL return code here to -EIDRM. This is analogous to > > what usb_unlink_urb() does on the host side. > > EIDRM sounds better. However I see that all udc drivers calling the > <udc>_dequeue() function returns -EINVAL when a request cannot be dequeued. > I do not dare to break this tradition. I guess it's too late to change this now. A pity... Alan Stern > BTW we tested my last proposal: > https://patchwork.kernel.org/patch/10796161/ > > ... I see that we can simplify this patch when we just insert the > stop_out_nacking(ep) between line 909 and 910: > > https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L909 > > and remove the stop_out_naking() call in start_queue()... > > I will add this improvement to the new patch sequence. > > Guido