Re: USB gadget lockup on MX53 udc

Michael McTernan <Michael.McTernan.2001@xxxxxxxxxxxxx> · Tue, 22 Jan 2013 08:14:19 +0000

On 01/21/2013 11:58 PM, Peter Chen wrote:
On Mon, Jan 21, 2013 at 01:34:51PM +0000, Michael McTernan wrote:
Hi,

I think I've found a debug in fsl_udc_core.c and hope for some review of the fix.  It's pretty complex so apologies in advance if my explanation misses anything.

The background is that I started with the 3.0.59 stable branch (which shows the problem), then merged down the specific changes around fsl_usb2_udc from /balbi/usb.git.  The problem can be demonstrated on the Freescale START-R board (iMX53 running at 1GHz) acting as a USB gadget with g_ether (EEM) loaded and connected to a Linux PC, though I normally work on a derivative board.

The problem I see is that if I hammer data over the USB Ethernet towards the iMX53, the USB Ethernet 'jams' after a random amount of time between 10 seconds and a couple of minutes.  I currently use iperf to generate lots of packets - a bit more than the link can really carry e.g. iperf -c 10.10.10.10 -u -b 150M -i 1 -t 6000.  When 'jammed' endpoint 0 still runs but endpoint 1 no longer transfers data.  /proc/interrupts show that no interrupts are being created for ep1 (ep0 will increase the count).

I beefed up /proc/driver/fsl_usb2_udc a bit, and it appears the queue on ep1out has gone wrong:

Endpoint Setup Status Reg: SETUP on ep 0x00

EP Ctrl Reg [0x00] =    [0x00800080]
EP Ctrl Reg [0x01] =    [0x00880088]
EP Ctrl Reg [0x02] =    [0x00080008]
EP Ctrl Reg [0x03] =    [0x00080008]
EP Ctrl Reg [0x04] =    [0x00080008]
EP Ctrl Reg [0x05] =    [0x00080008]
EP Ctrl Reg [0x06] =    [0x00080008]
EP Ctrl Reg [0x07] =    [0x00080008]
EP Prime Reg      = [0x00000000]
EP Complete Reg   = [0x00000000]
EP Status         = [0x00000000]

For ep0 Maxpkt is 0x00000040 index is 0x00000000
its req queue is empty

Gadget endpoints

For ep1out Maxpkt is 0x00000200 index is 0x00000001
curr_dtd_ptr     0x8f017080
next_dtd_ptr     0x1
size_ioc_int_sts 0x128000
req df9a9c80 actual 0x00000600 length 0x00000600  buf dfb47802 no_interrupt 0 status -115
   td_dma=0x8f0170c0 size_ioc_sts=0x06008080 next_td_ptr=0x8f017100
req df9a9f80 actual 0x00000600 length 0x00000600  buf df05c802 no_interrupt 0 status -115
   td_dma=0x8f017100 size_ioc_sts=0x06008080 next_td_ptr=0x8f017140
req df9a9600 actual 0x00000600 length 0x00000600  buf df05d802 no_interrupt 0 status -115
   td_dma=0x8f017140 size_ioc_sts=0x06008080 next_td_ptr=0x8f017180
req df9a9980 actual 0x00000600 length 0x00000600  buf df05e802 no_interrupt 0 status -115
   td_dma=0x8f017180 size_ioc_sts=0x06008080 next_td_ptr=0x8f0171c0
req df9a9e80 actual 0x00000600 length 0x00000600  buf df05f802 no_interrupt 0 status -115
   td_dma=0x8f0171c0 size_ioc_sts=0x06008080 next_td_ptr=0x8f017200
req df9a9e00 actual 0x00000600 length 0x00000600  buf df0bc802 no_interrupt 0 status -115
   td_dma=0x8f017200 size_ioc_sts=0x06008080 next_td_ptr=0x8f017240
req df9a9b80 actual 0x00000600 length 0x00000600  buf df0bd802 no_interrupt 0 status -115
   td_dma=0x8f017240 size_ioc_sts=0x06008080 next_td_ptr=0x8f017000
req df9a9580 actual 0x00000600 length 0x00000600  buf df0be802 no_interrupt 0 status -115
   td_dma=0x8f017000 size_ioc_sts=0x06008080 next_td_ptr=0x8f017040
req df9a9f00 actual 0x00000600 length 0x00000600  buf df0bf802 no_interrupt 0 status -115
   td_dma=0x8f017040 size_ioc_sts=0x06008080 next_td_ptr=0x8f017080
req df9a9b00 actual 0x00000600 length 0x00000600  buf df0cc802 no_interrupt 0 status -115
   td_dma=0x8f017080 size_ioc_sts=0x00128000 next_td_ptr=0x00000001

For ep1in Maxpkt is 0x00000200 index is 0x00000001
curr_dtd_ptr     0x8f017300
next_dtd_ptr     0x1
size_ioc_int_sts 0x8000
its req queue is empty

This output shows that hardware arrived on the last queue entry for ep1out but somehow skipped the other items in the list (size_ioc_sts & 0x8 still shows 0x8=active for them).  I believe this is due to an incorrect re-priming of the hardware, so I added some debug to fsl_queue_td().  I found that the error occurs when the request list is not empty, tmp_stat is not set after the ATDTW procedure, AND the queue head next_dtd_ptr != 1 (i.e. not terminate but a dTD pointer).  This last condition is unchecked in the software (checking the next_dtd_prr is not mentioned in the iMX53 Reference Manual), but from observation the next_dtd_ptr is normally always 1 when we have to re-prime and conversely when next_dtd_ptr contains a dTD address we see the jamming.  This observation is not however the primary cause, but explains how we end up with the broken queue and jam I see.

Working back, I found that jamming problem appears after the following [abbreviated] call sequence occurs:

fsl_udc_irq()
-> dtd_complete_irq()
   -> done()
     -> dma_pool_free(udc->td_pool, ...)
     -> req->req.complete()
        ....
       -> fsl_ep_queue()
         -> fsl_req_to_dtd()
           -> fsl_build_dtd()
             -> dma_pool_alloc(udc->td_pool, ...)
         -> fsl_queue_td()


Specifically when there is a problem, the DMA free and subsequent alloc use the same dTD address i.e. the pool element is reused straight away.  This should not be a problem, but when that occurs we see the above incorrect re-prime and subsequent endpoint jamming.

It looks like the loop in dtd_complete_irq() can catch a dTD just after it has been marked inactive by the hardware, but I suspect the hardware isn't quite done with the dTD at that point, perhaps re-reading the next_dtd_ptr a few cycles later (noting that the USB peripheral doesn't run at 1GHz like the CPU, but is maybe 60MHz or some such multiple).  Therefore if we quickly free and reuse the descriptor, it seems that we can corrupt the hardware's list traversal.  If I insert some delay into the code path between checking and freeing a dTD (e.g a printk), the problem isn't seen.  Similarly enabling VERBOSE in the driver prevents the problem due to the VDBG in dtd_complete_irq().  Also interestingly if DMAPOOL_DEBUG is set (defined automatically if CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON), the problem isn't seen; again I think because it sufficiently slows down the dTD checking loop.  Also breaking the loop so we have 1 interrupt per dTD 'fixes' the problem, though is sub-o
  ptimal.

The workaround I've arrived at is contained in the following patch.  With debug I frequently see "postponed" output for small counts of 1, 2, 3 or 4.  The highest value I've ever seen is 8.

I'm not sure the patch is entirely optimal, but since it's taken me a couple of weeks to track this down, I would at least like to share it for review.  Perhaps someone with more knowledge of the underlying hardware could confirm my findings?

Kind Regards,

Mike


Signed-off-by: Michael McTernan <Michael.McTernan,2001@xxxxxxxxxxxxx>

diff --git a/orig/fsl_udc_core.c b/mmct/fsl_udc_core.c
index 667275c..55978a3 100644
--- a/orig/fsl_udc_core.c
+++ b/mmct/fsl_udc_core.c
@@ -1607,6 +1607,32 @@ static void tripwire_handler(struct fsl_udc *udc, u8 ep_num, u8 *buffer_ptr)
  	fsl_writel(temp & ~USB_CMD_SUTW, &dr_regs->usbcmd);
  }

+
+/* fsl_dtd_wait_active(): check if some dTD is likely to be in use by hardware
+ *	and wait a short time if it is.  If still in use, we return allowing the
+ *	caller to defer until an interrupt.
+ */
+static bool fsl_dtd_wait_active(const struct ep_queue_head *curr_qh,
+								const struct ep_td_struct *curr_td)
+{
+	const volatile u32 *curr = &curr_qh->curr_dtd_ptr;
+	const volatile u32 *next = &curr_qh->next_dtd_ptr;
+	unsigned int loops = 100;
+
+	while(*next != 1 && *curr == curr_td->td_dma && loops > 0) {
+		loops--;
+	}
+
+	if (loops == 0) {
+		pr_warning("%s(): wait timeout\n", __func__);
+		return true;
+	} else if(loops != 100) {
+		pr_debug(KERN_INFO "%s(): postponed %u\n", __func__, 100 - loops);
+		return false;
+	}
+}
+
+
  /* process-ep_req(): free the completed Tds for this req */
  static int process_ep_req(struct fsl_udc *udc, int pipe,
  		struct fsl_req *curr_req)
@@ -1653,8 +1679,8 @@ static int process_ep_req(struct fsl_udc *udc, int pipe,
  				ERR("Unknown error has occurred (0x%x)!\n",
  					errors);

-		} else if (hc32_to_cpu(curr_td->size_ioc_sts)
-				& DTD_STATUS_ACTIVE) {
+		} else if ((hc32_to_cpu(curr_td->size_ioc_sts) & DTD_STATUS_ACTIVE) ||
+					fsl_dtd_wait_active(curr_qh, curr_td)) {
  			VDBG("Request not complete");
  			status = REQ_UNCOMPLETE;
  			return status;



Hi, your problem is much like below:
http://www.spinics.net/lists/linux-usb/msg64229.html

Ah - that is pretty much it.  The iMX53 chip errata doesn't mention this 
problem, which is something of an omission - but I see you previously 
said all i.MX and PowerPC chips are afflicted.

The patch proposed by Christoph Fritz is functionally better since it 
doesn't waste CPU cycles, though holds onto the DMA memory for a bit 
longer.  Probably a better trade.

However, it doesn't look like his patch made it in - could the 
maintainer pick this up?

Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html