> From: Stephen Warren [mailto:swarren@xxxxxxxxxxxxx] > Sent: Saturday, February 01, 2014 7:44 PM > > On 02/01/2014 03:00 AM, Andre Heider wrote: > > On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote: > >> On 01/31/2014 11:12 AM, Andre Heider wrote: > >>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote: > >>>> The DWC2 driver should now be in good enough shape to move out of > >>>> staging. I have stress tested it overnight on RPI running mass > >>>> storage and Ethernet transfers in parallel, and for several days > >>>> on our proprietary PCI-based platform. > >> ... > >>> this looks just fine, but for whatever reason it breaks sdhci on my rpi. > >>> With today's Linus' master the dwc2 controller seems to initialize fine, > >>> but I get this upon boot: > >>> > >>> [ 1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12 > >>> [ 1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error -12 > ... > >> This is due to the following code: > ... > >> What ends up happening, simply due to memory allocation order, is that > >> the memory writes inside usb_settoggle() end up setting the SDHCI struct > >> platform_device's num_resources to 0, so that it's call to > >> platform_get_resource() fails. > >> > >> With the DWC2 move patch reverted, some other random piece of memory is > >> being corrupted, which just happens not to cause any visible problem. > >> Likely it's some other struct platform_device that's already had its > >> resources read by the time DWC2 probes and corrupts them. > >> > >> (Yes, this was hard to find!) > > > > Nice work, but how did you pinpoint this? Am I missing some option/tool > > or did I just not stare for long enough? > > Well, there was a clear place where an issue was present; the resource > lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs > into that function to dump out the data platform_get_resource() used. > This clearly pointed at num_resources==0 being the problem. Next, I > dumped the same data from the code in drivers/of that sets it up, and it > was OK there, so I knew it was getting over-written somewhere. I then > basically added hundreds of calls to the same data dumping function > throughout kernel functions like really_probe() to track down the > location of the problem. Luckily, the behaviour was stable, so I wasn't > chasing a race/timing condition. Eventually I narrowed the window to the > few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would > have been much harder if it was e.g. the USB HW DMAing to memory that > caused the corruption, so I was lucky:-) Nice work Stephen, thanks. I will try to come up with a patch to fix this ASAP, along the lines of what Alan suggested. -- Paul -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html