Zitat von Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:
On Tue, 19 Feb 2019 guido@xxxxxxxxxxxxxxxxxx wrote:
Hi Alan,
My last proposal "udc: net2280: Fix overrun of OUT messages" is still
under investigation.
During the random stress tests I found a new rare problem:
When a request must be dequeued with net2280_dequeue() e.g. due
to a device clear action and the same request is finished by the
function scan_dma_completions():
https://elixir.bootlin.com/linux/v5.0-rc7/source/drivers/usb/gadget/udc/net2280.c#L1269
then the following search loop does not find the request and the function
returns the error -EINVAL without restoring the state ep->stopped = stopped!
https://elixir.bootlin.com/linux/v5.0-rc7/source/drivers/usb/gadget/udc/net2280.c#L1269
Thus the endpoint keeps blocked and does not receive any data.
When we insert ep->stopped = stopped here:
if (&req->req != _req) {
ep->stopped = stopped; // <<<<<----
spin_unlock_irqrestore(&ep->dev->lock, flags);
dev_err(&ep->dev->pdev->dev, "%s: Request mismatch\n",
__func__);
return -EINVAL;
}
then the driver continues as expected, although the driver issues the
false error message "Request mismatch".
What do you think? It's labor-intensive to fix the error message here,
or shall we leave it as it is, and only insert the "ep->stopped = stopped;"
Yes, this is a bug. I think the dev_err() should be changed to
ep_dbg(). It's not a real error, after all -- it's just a race.
I agree. ep_dbg() is better here. I see the same bug in the sibling driver
net2272_dequeue():
https://elixir.bootlin.com/linux/v5.0-rc8/source/drivers/usb/gadget/udc/net2272.c#L948
The bug is quite obvious to me, however I'm not aware that we use this driver
in any of our instruments. I cannot test it. What do you recommend?
- Shall I fix it within the same commit.
- Shall I fix it with another commit.
- Don't care about it
Do that and insert the "ep->stopped = stopped;" line. Also, consider
changing the -EINVAL return code here to -EIDRM. This is analogous to
what usb_unlink_urb() does on the host side.
EIDRM sounds better. However I see that all udc drivers calling the
<udc>_dequeue() function returns -EINVAL when a request cannot be dequeued.
I do not dare to break this tradition.
BTW we tested my last proposal:
https://patchwork.kernel.org/patch/10796161/
... I see that we can simplify this patch when we just insert the
stop_out_nacking(ep) between line 909 and 910:
https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L909
and remove the stop_out_naking() call in start_queue()...
I will add this improvement to the new patch sequence.
Guido