Re: Chipidea USB controller hangs in peripheral mode under high memory bus pressure

Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> · Tue, 16 May 2017 01:54:46 +0300

Hi Thomas,

On Monday 08 May 2017 04:42:59 Thomas Entner / EE wrote:
> Am 08.05.2017 um 03:56 schrieb Peter Chen:
> > On Fri, May 05, 2017 at 04:10:14PM +0200, Thomas Entner / EE wrote:
> >> Am 05.05.2017 um 14:56 schrieb Thomas Entner / EE:
> >>> Hi, I am Thomas who contacted Laurent regarding this issue. We have some
> >>> further observations:
> >>> 
> >>> Am 04.05.2017 um 08:45 schrieb Peter Chen:
> >>>> On Wed, May 03, 2017 at 01:32:28PM +0300, Laurent Pinchart wrote:
> >>>>>> There was no one reported this problem before, but from the
> >>>>>> description, it seems an IC issue which is triggered at high loading
> >>>>>> memory bus, controller may not get time to visit memory at limited
> >>>>>> time.
> >>>>> 
> >>>>> That's my guess too. I was expecting the USB controller's bus master
> >>>>> interface to get stalled but eventually perform the access (or retry
> >>>>> it, I'm not sure what kind of bus it sits on), but there might be a
> >>>>> hardware bug that messes up the controller's state machine. I won't
> >>>>> rule out the possibility of a software issue yet, it might be possible
> >>>>> to detect this condition and retry the transfer.
> >>>> 
> >>>> I am not sure if it can be recovered, you can call ->fifo_flush, and
> >>>> ->ep_disable and ->ep_enable if it returns -ETIME, and re-submit this
> >>>> request.
> >>>> 
> >>>>>> At Xilinx Zynq, its tx buffer is small, and less than 512 bytes
> >>>>>> (84bc70f94d81, "usb: chipidea: add xilinx zynq platform data"), and
> >>>>>> your throughput may be > (512 * 3) bytes/SoF, you can't use non-
> >>>>>> stream mode by reducing max packet size.
> >>>>> 
> >>>>> My throughput is actually 1*1024 bytes / SOF.
> >>>>> 
> >>>> >From previous discussion, the tx fifo size is 341.33 bytes for xilinx
> >>>> 
> >>>> zynq, you can set max packet size as 341 and mult as 3, then you can
> >>>> transfer 1023 bytes / SoF for non-stream mode, assumed the non-stream
> >>>> mode can fix your problem.
> >>> 
> >>> I am not sure if this is totally correct: Xilinx UG585, page 1815,
> >>> register HWTXBUF (TXADD), the TxBuffer size of the Xilinx controller is
> >>> only 768 Bytes. However, I think you are correct that the issue is
> >>> related to this topic:
> >>> 
> >>> When we reduce the max packet size to 512, we no longer get the full
> >>> lock-ups, but UVC streaming still stops after some time (under have
> >>> memory traffic).
> >>> 
> >>> When we reduce the max packet size to 256, things appear to work stable
> >>> (but very slow).
> >>> 
> >>> When we use a max packet size of 720 we still see lock-ups (which is a
> >>> bit of surprise to me, I would have expected something close to 768 to
> >>> be the magical limit).
> >> 
> >> We have further debugged the issue now with a USB bus analyzer and made a
> >> (for us) surprising observation:
> >> 
> >> Both the lock-ups (packet size = 1024) and the stop of streaming without
> >> lock-up (packet size = 512) always happen at the end of a UVC frame (i.e.
> >> end of image). We can see the payload header (e.g. 02 83) with the EOF
> >> bit set, but that packet has a CRC error (end the end of the packet is
> >> not FF D9, as it should be for our MJPEG payload), I assume because the
> >> Tx-buffer did underrun.
> >> 
> >> My present understanding was that the DMA of the Chipidea IP was not able
> >> to fast enough refill the Tx buffer, but then I would expect this to
> >> happen also e.g. in the middle of the image and not only at the last
> >> transfer?
> >
> > From the below commit:
> > 84bc70f94d81 ("usb: chipidea: add xilinx zynq platform data"), the tx
> > fifo is less than 512 Bytes, you may calculate it through [1]
> > 
> > 
> > [1] https://www.spinics.net/lists/linux-usb/msg129116.html
> 
> Just very brief, should go to bed now...:
> We have a interim workaround:
> - We use 3072 packet size (3x1024), but always queue only one buffer.
> This prevents the lock-up issue. (We want to revisit this later for a
> better workaround.)
> - However, we still see CRC errors. This could be maybe related to this
> very small buffer? As the CRC errors are seldom, we can accept them. But
> to understand the root cause of this CRC errors would be very
> interesting (I think this CRC error also triggers the bad ATDTW behavior.
> - But when there is a CRC error, streaming still stopped. The reason
> was, that in case of an -EILSEQ (caused by CRC error) the request
> complete callback was not called. I have patched this in the driver.
> - I can provide you the patch later if you are interested, but I am not
> sure if it is good enough for all situations were the Chipidea IP is used...

I would appreciate if you could provide the patch, as I'm interested in 
integrating that fix in the upstream uvcvideo driver.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html