Re: USB client crash on Vybrid with USB gadget RNDIS connection

Peter Chen <peter.chen@xxxxxxxxxxxxx> · Tue, 22 Sep 2015 07:36:01 +0800

On Mon, Sep 21, 2015 at 06:56:34PM +0530, maitysanchayan@xxxxxxxxx wrote:
> On 15-09-21 14:50:18, Peter Chen wrote:
> > On Fri, Sep 18, 2015 at 04:01:50PM +0530, maitysanchayan@xxxxxxxxx wrote:
> > > On 15-09-18 13:39:11, Peter Chen wrote:
> > > > On Wed, Sep 16, 2015 at 02:48:50PM +0530, maitysanchayan@xxxxxxxxx wrote:
> > > > > On 15-09-16 15:54:21, Peter Chen wrote:
> > > > > > On Wed, Sep 16, 2015 at 02:18:49PM +0530, maitysanchayan@xxxxxxxxx wrote:
> > > > > > > Hello Peter,
> > > > > > > 
> > > > > > > > 
> > > > > > > > Enable CONFIG_DEBUG_LIST, it has below position if you
> > > > > > > > run make menuconfig
> > > > > > > > Kernel hacking  --->
> > > > > > > > [*] Debug linked list manipulation  
> > > > > > > > 
> > > > > > > 
> > > > > > > Sorry for the delay. When I enabled this config the first time my test
> > > > > > > application ran for 24 hours or so and I did not get any stack traces.
> > > > > > > 
> > > > > > > I restarted the test again and finally got the trace below. You were
> > > > > > > spot on, its a list corruption issue. I modified the trace a bit after
> > > > > > > copying to remove the sprinkled debug messages throughout the trace
> > > > > > > from my test application.
> > > > > > > 
> > > > > > > [  622.204134] WARNING: CPU: 0 PID: 0 at lib/list_debug.c:59 __list_del_entry+0xc4/0xe8()
> > > > > > > [  622.212870] list_del corruption. prev->next should be 8db63600, but was 36008db6
> > > > > > 
> > > > > > You see the higher 16 bits were swapped with lower 16 bits, and the
> > > > > > virtual memory address should begin from 0x8xxxxxxxx, right?
> > > > > 
> > > > > Yes, I saw that but beats me how this happens.
> > > > > 
> > > > > > 
> > > > > > Check with Vybrid errata to see if all ARM/memory system have applied.
> > > > > 
> > > > > What do you mean by "all ARM/memory system have applied" ? I checked with the Vybrid errata
> > > > > and I do not see anything related.
> > > > > 
> > > > 
> > > > Just system level errata, like ARM Cortex A5, memory (L1/L2 Cache), etc.
> > > > 
> > > > Would you please do more tests to see if the error pattern is always
> > > > the same?
> > > 
> > > I got more or less the same logs as below the last five times I tried today
> > > and this time I got the crashes quickly enough somehow. Did not have to wait
> > > for more than half an hour.
> > > 
> > > > And print the address to store prev-next.
> > > 
> > > Isn't that what's given by list_del corruption info?
> > 
> > It only prints the content of prev->next, not without the address of
> > prev->next, I just want to make sure this address is dword aligned.
> 
> Ok.
> 
> > 
> > [  476.880749] list_del corruption. prev->next should be 8daf74c0, but was 74c08daf
> > 
> > > 
> > > Interesting that atleast one more person Felipe Tonello sees the same issue.
> > > 
> > > Felipe mentions a DMA issue, I saw a DMA error message from ci_hdrc once in the
> > > last five times I tried but mistakenly I did not take that one down. The message
> > > was something along the lines "ci_hdrc: ci_hdrc bad dma alloc" or similar.
> > 
> > Make sure you really see dma_pool_alloc fail or not, it may not the same
> > problem
> 
> That message was exactly
> 
> [ 1186.114496] ci_hdrc ci_hdrc.0: dma_pool_free ci_hw_td,   (null)/8d3c1e6c (bad dma)
> 

Does above message occur just close to linked list corruption?
Or it is during the correct transfer process?

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html