Hi Santosh, Thanks for the comments! If you do not think the GPMC can be a factor here, that's great news. I'd certainly love to take something off the list. Unfortunately, we've run the "memtester 200M" (we only have 256MB) in constant cycles for weekends at a time, and have not had it crash. We did this back when we thought perhaps our RAM timings were wrong since the NANYA chip was not one mentioned in the source anywhere. If we were having memory errors, I would suspect that would have caught them correct? Should we run that with a larger number than 200? We didn't want to dig into our operating system's space too much. Thanks again! ----- Original Message ----- From: "Shilimkar, Santosh" <santosh.shilimkar@xxxxxx> To: CF Adad <cfadad@xxxxxxxxxxxxxx> Cc: Tony Lindgren <tony@xxxxxxxxxxx>; "linux-omap@xxxxxxxxxxxxxxx" <linux-omap@xxxxxxxxxxxxxxx> Sent: Wednesday, June 6, 2012 2:36 AM Subject: Re: Please help! AM35xx mm/slab.c BUG On Wed, Jun 6, 2012 at 11:44 AM, CF Adad <cfadad@xxxxxxxxxxxxxx> wrote: > All, > > > We've learned a few more things: > > 1.) We have found a way to get it to happen pretty consistently. We simply run iperf in a loop using the EMAC port to some other device. > > > 2.) The crash ONLY happens on our custom board, not on the Twister dev kit. This is true despite the fact that I ported our latest linux-omap 3.4-rc6 over there. We're still running Technexion's default x-loader and u-boot to handle proper configs on that board. So, that's a substantial bit of code that is different between our boxes. The kernel is altered only in that the few pinmux changes I left in Linux have been removed to avoid configuration differences between the two boards. > > > This suggests that either: > A) We have a hardware problem on our board. Seems unlikely. Can anyone think of anything hardware related that would manifest itself with these sorts of errors? > > > B) We have a issue in our bootloader code somehwere. I hesitated to overwrite the bootloaders for this test on the Twister baseboard just because I did not want to have to mess with getting the pinmux's and the like put back and such. > > Presuming something in those bootloaders is our problem, I wonder what EMAC-related stuff there really is. For a long time we ran with our bootloaders NOT initializing either of the Eths. This was Technexion's default. They left that work to Linux. We've recently done work to enable them in u-boot, but we were crashing like this long before that. Once in Linux, we're just using the standard drivers and calls from within the board file to SMSC911x and the Davinci EMAC drivers. I am using the patches that allow the e-fused MAC to be pulled from the AM35xx for the EMAC, but I can't see how that would cause this. > > Assuming the EMAC is perhaps an innocent bystander that happens just to cause this, the place I would have to suspect the most in our bootloaders would be the GPMC settings. We've done a good bit of tweaking in there since we switched chips. *Could a GPMC timing issue account for these types of errors???* The reason I bring it up is that the GPMC has been one of those things that we've really struggled to understand. What should the timings *really* be? We've done the best we can to try to guess our way through it. BUT, we could certainly be very wrong. If a GPMC setting could cause these types of bugs, please let me know. I'll be happy to post more info on how we're setting that up now. In case not, I'll save the electrons and not spam it here. > > I don't know the AMXX architecture that well but looking at the crash-log, am not sure GPMC should play in role here. What I think is, it is mostly memory corruption and can be caused by many reasons as Tony outlined. To ensure that, your memory is in good state, can you run memtester for long duration and see that you are not getting any memory failures. Try to give the maximum memory size as a an input to memtester. You can download one from [1] Regards Santosh [1] http://pyropus.ca/software/memtester/ -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html