Re: Please help! AM35xx mm/slab.c BUG

jean-philippe francois <jp.francois@xxxxxxxxxx> · Thu, 14 Jun 2012 21:10:12 +0200

2012/6/14 CF Adad <cfadad@xxxxxxxxxxxxxx>:
> An update:
>
> *LAN9221 and GPMC off the hook?*
>
> We've isolated the GPMC away from this I believe by disabling the LAN9221 in both our bootloaders and the kernel and by booting everything off the SD cards only.  By removing it's initialization code from the respective board files, I *hope* we've basically removed it from contention.  Obviously the chip is still wired up, but I don't expect the bootloaders or kernel to be trying to talk to it.  Likewise, the NAND is being initialized, but we're not mouting or using it at all.
>
> With these changes we're still seeing these crashes, albeit with the same incredible lack of frequency.
>
> *EMAC now _partially_ on the hook?*
> I posted a seperate thread on what I think may be a related subject, potential Davinci EMAC problems, here:  http://www.spinics.net/lists/linux-omap/msg71833.html.
>
> As you can see from the crashes posted there, there seems to be a bit of whining from the EMAC driver.  Since performance in the EMAC <=> EMAC case has always been questionable anyway, any chance there is a tiny memory leak or something similar that could be contributing?
>
> What about configuring this EMAC from within u-boot?  Could that initialization do something bad when we get into Linux?  I've not touched these drivers.  I've simply called them like other boards in the family are doing.
>
> Just this morning, I upgraded to the latest linux-omap 3.5-rc2, but still saw one of these crashes pretty quickly...
>
> *Power stability?*
>
> We're learning through all of this that our boards do appear to have some funny transients running through the power circuits every so often.  The ones we've captured on the scope have not caused crashes or hard lockups, but they are there.  This could be a dumb question, but could power issues create a slab error like this???  I guess I'm more accustomed to seeing power issues result in more hard lock ups than a nicely worded dump with the kernel sometimes still somewhat functioning.
>
>
I am following this bug with interest, because we often go the "custom
hardware" way, and have faced situation like these.
In my opinion, random memory corruption is more than often the sign of an
hardware design issue. EMAC here is perhaps only a symptom, because
it provides the proper memory bandwith and power consumption pattern that
triggers the glitch.

How many board do you have ? Are some more stable than others ?
Can you solder additional caps on top of your power decoupling caps ?
Can you tweak the voltages ?

> Can anyone suggest to me anything I may not have tried to get more information out of these crashes when they occur?
>
> Thanks again to all!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html