Re: [PATCH 2/3] udc: net2280: Fix overrun of OUT messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Zitat von Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:

On Wed, 6 Feb 2019 guido@xxxxxxxxxxxxxxxxxx wrote:

Alan,

I have inserted a bug detection and can now explain the race:

Ween receiving a long transfer the gadget driver queues a new request with
net2280_queue() and calls start_dma(). In the meantime the OUT FIFO is still
receiving data and the short packet is not detected yet. So the start_dma()
function does not use the shortcut in line:

https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L905

Thus start_queue() is called at the end of the function which calls
stop_out_naking(). This will allow receiving new data, even when a short
packet is arrived in the meantime.

I'm confused.  A transfer is in progress, let's say for request R1.
Before R1 finishes, the gadget driver queues a new request, R2.  Since
R1 is still running, ep->queue is not empty, so net2280_queue() will
not call start_dma() -- it will call queue_dma() instead.  But this is
not what your stack dump shows.

Maybe you meant that R1 completes and then afterward the gadget driver
queues R2.  Then there is a race: start_dma() sees that no short packet
has arrived, but a short packet could arrive before stop_out_naking()
is called.

Yes, I mean the second case. Currently we put only one request into the
OUT queue. Using 2-3 requests would even improve bandwidth for long transfers,
however it does not help to avoid the bug when the application input is
blocked.

Indeed, I'm not sure now whether my patch only minimizes the race or
whether we need to rethink the logic here and call stop_out_naking() when
the request is really done and delivered to the gadget driver.
See function done().

The right time to call stop_out_naking is immediately after we complete
a short transfer (that is, a transfer which ended with a short packet).
start_queue() is definitely the wrong place.

Right at the end of done() is probably the right place.  But do it only
if the last packet of the transfer was a short one.  And note that this
does not necessarily mean the transfer got an error: If the request was
for only 15 bytes then it could complete successfully but the (only)
packet would still be short.

I tried to call stop_out_naking() at the end of done(). However I have
overseen that the author of the driver only wants to receive data when there
is an OUT request available. E.g. see:
https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L304

So I think my patch was not so "bad" and I assume we can prevent the race
with the following patch (or easier see below):

static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma)
{
	struct net2280_dma_regs	__iomem *dma = ep->dma;
	unsigned int tmp = BIT(VALID_BIT) | (ep->is_in << DMA_DIRECTION);

	if (!(ep->dev->quirks & PLX_2280))
		tmp |= BIT(END_OF_CHAIN);

	writel(tmp, &dma->dmacount);
	writel(readl(&dma->dmastat), &dma->dmastat);

	writel(td_dma, &dma->dmadesc);
	if (ep->dev->quirks & PLX_PCIE)
		dmactl |= BIT(DMA_REQUEST_OUTSTANDING);
	writel(dmactl, &dma->dmactl);

	/* erratum 0116 workaround part 3:  pci arbiter away from net2280 */
	(void) readl(&ep->dev->pci->pcimstctl);

	if (!ep->is_in &&
			(readl(&ep->regs->ep_stat) & BIT(NAK_OUT_PACKETS)) &&
			(readl(&ep->regs->ep_avail) == 0)) {
		stop_out_naking(ep);
	}
	writel(BIT(DMA_START), &dma->dmastat);
}

start_queue() is only called by start_dma(). I see that we can simplify this patch
when we just insert the stop_out_nacking(ep) between line 909 and 910:

https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L909

and remove the stop_out_naking() call in start_queue(). I will test this over weekend.

Regards,

Guido






[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux