Re: Chipidea USB controller hangs in peripheral mode under high memory bus pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter,

Thank you for your quick reply.

On Wednesday 03 May 2017 11:10:55 Peter Chen wrote:
> On Tue, May 02, 2017 at 03:07:03PM +0300, Laurent Pinchart wrote:
> > Hello,
> > 
> > I ran into an issue with a Xilinx Zynq XC7Z010 system. The system acts as
> > a USB peripheral, using the UVC gadget driver.
> > 
> > When transferring high bandwidth data over USB in isochronous mode,
> > complete system freezes are occasionally noticed. The problem was traced
> > to the following code from _hardware_enqueue() in
> > drivers/usb/chipidea/udc.c.
> >
> > 		wmb();
> > 		if (hw_read(ci, OP_ENDPTPRIME, BIT(n)))
> > 			goto done;
> > 		do {
> > 			hw_write(ci, OP_USBCMD, USBCMD_ATDTW, USBCMD_ATDTW);
> > 			tmp_stat = hw_read(ci, OP_ENDPTSTAT, BIT(n));
> > 		} while (!hw_read(ci, OP_USBCMD, USBCMD_ATDTW));
> > 		hw_write(ci, OP_USBCMD, USBCMD_ATDTW, 0);
> > 		if (tmp_stat)
> > 			goto done;
> > 
> > The do ... while loop loops forever, and as the function is called under a
> > spin_lock_irqsave(), the system doesn't appreciate. Adding a maximum
> > number of iterations to exit the loop is easy (I'll try to submit a patch
> > after finding the root cause of the problem). That fixes the system hang,
> > but USB transfers are still broken.
> > 
> > I've checked the code and unfortunately it seems to match the procedure
> > documented in the datasheet :-/
> > 
> > The MTBF is several hours, but running 'memtester -M100'
> > (http://pyropus.ca/software/memtester/) in parallel to UVC video transfer
> > over USB brings the MTBF to a few minutes. The problem thus seems to be
> > related to memory bus pressure.
> > 
> > Has anyone run into this problem before ? Is this a known issue ? I don't
> > mind getting my hands dirty debugging, but as I'm not familiar with the
> > chipidea USB controller pointers to what I should check in priority would
> > be appreciated.
> 
> There was no one reported this problem before, but from the description,
> it seems an IC issue which is triggered at high loading memory bus,
> controller may not get time to visit memory at limited time.

That's my guess too. I was expecting the USB controller's bus master interface 
to get stalled but eventually perform the access (or retry it, I'm not sure 
what kind of bus it sits on), but there might be a hardware bug that messes up 
the controller's state machine. I won't rule out the possibility of a software 
issue yet, it might be possible to detect this condition and retry the 
transfer.

> At Xilinx Zynq, its tx buffer is small, and less than 512 bytes
> (84bc70f94d81, "usb: chipidea: add xilinx zynq platform data"), and your
> throughput may be > (512 * 3) bytes/SoF, you can't use non-stream mode by
> reducing max packet size.

My throughput is actually 1*1024 bytes / SOF.

> I think you may observe many under-run at bus analyzer during the test.

I'll try to get this checked (as I don't have a USB analyzer here).

> As a workaround, you may try to do below things:
> 
> 1. Link more TDs before the UVC run

Do you mean I should have more requests queued to avoid hitting the 
software/hardware race that the loop handles ?

> 2. Comment out the code, you are stuck in, it is only useful for protect
> last td status which is handling or will be handled soon by hardware.
> 
> 		/*
> 		do {
> 			hw_write(ci, OP_USBCMD, USBCMD_ATDTW, USBCMD_ATDTW);
> 			tmp_stat = hw_read(ci, OP_ENDPTSTAT, BIT(n));
> 		} while (!hw_read(ci, OP_USBCMD, USBCMD_ATDTW));
> 		hw_write(ci, OP_USBCMD, USBCMD_ATDTW, 0);
> 		if (tmp_stat)
> 			goto done;
> 		*/

I'm not familiar with this driver, but if I understand things correctly, the 
code (and the "TD tripwire" hardware feature) is here to handle a race 
condition where the hardware could look at the last TD's next pointer at the 
same time it gets updated by the software. If I comment the code out, won't 
the endpoint will be primed unconditionally, even if it's currently executing 
transfers ? Won't that be a problem ?

> 3. If it can let your test run more time, try change code like below:
> 
> if (remaining_td_num > 2)
> 	don't do hardware check;
> else
> 	do hardware check.

I'll test that, but I assume I'm hitting the problem when the number of 
remaining TDs is exactly one.

> Besides, I can try your test if you could show me the detail test steps.

I need to create an easy test case, at the moment it's a complex application 
and not every part of it can be published. I'll try to provide an open-source 
test case implementation.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux