> From: Paul Zimmerman > Sent: Monday, February 03, 2014 9:36 AM > >> From: Stephen Warren [mailto:swarren@xxxxxxxxxxxxx] >> Sent: Saturday, February 01, 2014 7:44 PM >> >> On 02/01/2014 03:00 AM, Andre Heider wrote: >>> On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote: >>>> On 01/31/2014 11:12 AM, Andre Heider wrote: >>>>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote: >>>>>> The DWC2 driver should now be in good enough shape to move out of >>>>>> staging. I have stress tested it overnight on RPI running mass >>>>>> storage and Ethernet transfers in parallel, and for several days >>>>>> on our proprietary PCI-based platform. >>>> ... >>>>> this looks just fine, but for whatever reason it breaks sdhci on my rpi. >>>>> With today's Linus' master the dwc2 controller seems to initialize fine, >>>>> but I get this upon boot: >>>>> >>>>> [ 1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12 >>>>> [ 1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error -12 >> ... >>>> This is due to the following code: >> ... >>>> What ends up happening, simply due to memory allocation order, is that >>>> the memory writes inside usb_settoggle() end up setting the SDHCI struct >>>> platform_device's num_resources to 0, so that it's call to >>>> platform_get_resource() fails. >>>> >>>> With the DWC2 move patch reverted, some other random piece of memory is >>>> being corrupted, which just happens not to cause any visible problem. >>>> Likely it's some other struct platform_device that's already had its >>>> resources read by the time DWC2 probes and corrupts them. >>>> >>>> (Yes, this was hard to find!) >>> >>> Nice work, but how did you pinpoint this? Am I missing some option/tool >>> or did I just not stare for long enough? >> >> Well, there was a clear place where an issue was present; the resource >> lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs >> into that function to dump out the data platform_get_resource() used. >> This clearly pointed at num_resources==0 being the problem. Next, I >> dumped the same data from the code in drivers/of that sets it up, and it >> was OK there, so I knew it was getting over-written somewhere. I then >> basically added hundreds of calls to the same data dumping function >> throughout kernel functions like really_probe() to track down the >> location of the problem. Luckily, the behaviour was stable, so I wasn't >> chasing a race/timing condition. Eventually I narrowed the window to the >> few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would >> have been much harder if it was e.g. the USB HW DMAing to memory that >> caused the corruption, so I was lucky:-) > > Nice work Stephen, thanks. I will try to come up with a patch to fix this > ASAP, along the lines of what Alan suggested. Stephen, Andre, Can you test the attached patch, please? It works for my on the Synopsys PCIe-based FPGA board. Unfortunately my RPI board is currently broken, so I am unable to test it there to verify it actually fixes the problem you are seeing. The dwc2 driver doesn't use the usb_device toggle bits anywhere else, so the quickest fix is to just remove the problematic code from _dwc2_hcd_endpoint_reset(). If you give me your tested-bys, I will submit this as a proper patch to Greg. -- Paul
Attachment:
dwc2-toggle.patch
Description: dwc2-toggle.patch