On Wed, 6 Feb 2019 guido@xxxxxxxxxxxxxxxxxx wrote: > Alan, > > I have inserted a bug detection and can now explain the race: > > Ween receiving a long transfer the gadget driver queues a new request with > net2280_queue() and calls start_dma(). In the meantime the OUT FIFO is still > receiving data and the short packet is not detected yet. So the start_dma() > function does not use the shortcut in line: > > https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L905 > > Thus start_queue() is called at the end of the function which calls > stop_out_naking(). This will allow receiving new data, even when a short > packet is arrived in the meantime. I'm confused. A transfer is in progress, let's say for request R1. Before R1 finishes, the gadget driver queues a new request, R2. Since R1 is still running, ep->queue is not empty, so net2280_queue() will not call start_dma() -- it will call queue_dma() instead. But this is not what your stack dump shows. Maybe you meant that R1 completes and then afterward the gadget driver queues R2. Then there is a race: start_dma() sees that no short packet has arrived, but a short packet could arrive before stop_out_naking() is called. > Indeed, I'm not sure now whether my patch only minimizes the race or > whether we need to rethink the logic here and call stop_out_naking() when > the request is really done and delivered to the gadget driver. > See function done(). The right time to call stop_out_naking is immediately after we complete a short transfer (that is, a transfer which ended with a short packet). start_queue() is definitely the wrong place. Right at the end of done() is probably the right place. But do it only if the last packet of the transfer was a short one. And note that this does not necessarily mean the transfer got an error: If the request was for only 15 bytes then it could complete successfully but the (only) packet would still be short. Alan Stern