On 02/01/2014 03:00 AM, Andre Heider wrote: > On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote: >> On 01/31/2014 11:12 AM, Andre Heider wrote: >>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote: >>>> The DWC2 driver should now be in good enough shape to move out of >>>> staging. I have stress tested it overnight on RPI running mass >>>> storage and Ethernet transfers in parallel, and for several days >>>> on our proprietary PCI-based platform. >> ... >>> this looks just fine, but for whatever reason it breaks sdhci on my rpi. >>> With today's Linus' master the dwc2 controller seems to initialize fine, >>> but I get this upon boot: >>> >>> [ 1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12 >>> [ 1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error -12 ... >> This is due to the following code: ... >> What ends up happening, simply due to memory allocation order, is that >> the memory writes inside usb_settoggle() end up setting the SDHCI struct >> platform_device's num_resources to 0, so that it's call to >> platform_get_resource() fails. >> >> With the DWC2 move patch reverted, some other random piece of memory is >> being corrupted, which just happens not to cause any visible problem. >> Likely it's some other struct platform_device that's already had its >> resources read by the time DWC2 probes and corrupts them. >> >> (Yes, this was hard to find!) > > Nice work, but how did you pinpoint this? Am I missing some option/tool > or did I just not stare for long enough? Well, there was a clear place where an issue was present; the resource lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs into that function to dump out the data platform_get_resource() used. This clearly pointed at num_resources==0 being the problem. Next, I dumped the same data from the code in drivers/of that sets it up, and it was OK there, so I knew it was getting over-written somewhere. I then basically added hundreds of calls to the same data dumping function throughout kernel functions like really_probe() to track down the location of the problem. Luckily, the behaviour was stable, so I wasn't chasing a race/timing condition. Eventually I narrowed the window to the few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would have been much harder if it was e.g. the USB HW DMAing to memory that caused the corruption, so I was lucky:-) -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html