On Mon, Sep 21, 2015 at 06:56:34PM +0530, maitysanchayan@xxxxxxxxx wrote: > On 15-09-21 14:50:18, Peter Chen wrote: > > On Fri, Sep 18, 2015 at 04:01:50PM +0530, maitysanchayan@xxxxxxxxx wrote: > > > On 15-09-18 13:39:11, Peter Chen wrote: > > > > On Wed, Sep 16, 2015 at 02:48:50PM +0530, maitysanchayan@xxxxxxxxx wrote: > > > > > On 15-09-16 15:54:21, Peter Chen wrote: > > > > > > On Wed, Sep 16, 2015 at 02:18:49PM +0530, maitysanchayan@xxxxxxxxx wrote: > > > > > > > Hello Peter, > > > > > > > > > > > > > > > > > > > > > > > Enable CONFIG_DEBUG_LIST, it has below position if you > > > > > > > > run make menuconfig > > > > > > > > Kernel hacking ---> > > > > > > > > [*] Debug linked list manipulation > > > > > > > > > > > > > > > > > > > > > > Sorry for the delay. When I enabled this config the first time my test > > > > > > > application ran for 24 hours or so and I did not get any stack traces. > > > > > > > > > > > > > > I restarted the test again and finally got the trace below. You were > > > > > > > spot on, its a list corruption issue. I modified the trace a bit after > > > > > > > copying to remove the sprinkled debug messages throughout the trace > > > > > > > from my test application. > > > > > > > > > > > > > > [ 622.204134] WARNING: CPU: 0 PID: 0 at lib/list_debug.c:59 __list_del_entry+0xc4/0xe8() > > > > > > > [ 622.212870] list_del corruption. prev->next should be 8db63600, but was 36008db6 > > > > > > > > > > > > You see the higher 16 bits were swapped with lower 16 bits, and the > > > > > > virtual memory address should begin from 0x8xxxxxxxx, right? > > > > > > > > > > Yes, I saw that but beats me how this happens. > > > > > > > > > > > > > > > > > Check with Vybrid errata to see if all ARM/memory system have applied. > > > > > > > > > > What do you mean by "all ARM/memory system have applied" ? I checked with the Vybrid errata > > > > > and I do not see anything related. > > > > > > > > > > > > > Just system level errata, like ARM Cortex A5, memory (L1/L2 Cache), etc. > > > > > > > > Would you please do more tests to see if the error pattern is always > > > > the same? > > > > > > I got more or less the same logs as below the last five times I tried today > > > and this time I got the crashes quickly enough somehow. Did not have to wait > > > for more than half an hour. > > > > > > > And print the address to store prev-next. > > > > > > Isn't that what's given by list_del corruption info? > > > > It only prints the content of prev->next, not without the address of > > prev->next, I just want to make sure this address is dword aligned. > > Ok. > > > > > [ 476.880749] list_del corruption. prev->next should be 8daf74c0, but was 74c08daf > > > > > > > > Interesting that atleast one more person Felipe Tonello sees the same issue. > > > > > > Felipe mentions a DMA issue, I saw a DMA error message from ci_hdrc once in the > > > last five times I tried but mistakenly I did not take that one down. The message > > > was something along the lines "ci_hdrc: ci_hdrc bad dma alloc" or similar. > > > > Make sure you really see dma_pool_alloc fail or not, it may not the same > > problem > > That message was exactly > > [ 1186.114496] ci_hdrc ci_hdrc.0: dma_pool_free ci_hw_td, (null)/8d3c1e6c (bad dma) > Does above message occur just close to linked list corruption? Or it is during the correct transfer process? -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html