On Mon, Feb 03, 2014 at 08:51:48PM +0000, Paul Zimmerman wrote: > > From: Paul Zimmerman > > Sent: Monday, February 03, 2014 9:36 AM > > > >> From: Stephen Warren [mailto:swarren@xxxxxxxxxxxxx] > >> Sent: Saturday, February 01, 2014 7:44 PM > >> > >> On 02/01/2014 03:00 AM, Andre Heider wrote: > >>> On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote: > >>>> On 01/31/2014 11:12 AM, Andre Heider wrote: > >>>>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote: > >>>>>> The DWC2 driver should now be in good enough shape to move out of > >>>>>> staging. I have stress tested it overnight on RPI running mass > >>>>>> storage and Ethernet transfers in parallel, and for several days > >>>>>> on our proprietary PCI-based platform. > >>>> ... > >>>>> this looks just fine, but for whatever reason it breaks sdhci on my rpi. > >>>>> With today's Linus' master the dwc2 controller seems to initialize fine, > >>>>> but I get this upon boot: > >>>>> > >>>>> [ 1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12 > >>>>> [ 1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error -12 > >> ... > >>>> This is due to the following code: > >> ... > >>>> What ends up happening, simply due to memory allocation order, is that > >>>> the memory writes inside usb_settoggle() end up setting the SDHCI struct > >>>> platform_device's num_resources to 0, so that it's call to > >>>> platform_get_resource() fails. > >>>> > >>>> With the DWC2 move patch reverted, some other random piece of memory is > >>>> being corrupted, which just happens not to cause any visible problem. > >>>> Likely it's some other struct platform_device that's already had its > >>>> resources read by the time DWC2 probes and corrupts them. > >>>> > >>>> (Yes, this was hard to find!) > >>> > >>> Nice work, but how did you pinpoint this? Am I missing some option/tool > >>> or did I just not stare for long enough? > >> > >> Well, there was a clear place where an issue was present; the resource > >> lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs > >> into that function to dump out the data platform_get_resource() used. > >> This clearly pointed at num_resources==0 being the problem. Next, I > >> dumped the same data from the code in drivers/of that sets it up, and it > >> was OK there, so I knew it was getting over-written somewhere. I then > >> basically added hundreds of calls to the same data dumping function > >> throughout kernel functions like really_probe() to track down the > >> location of the problem. Luckily, the behaviour was stable, so I wasn't > >> chasing a race/timing condition. Eventually I narrowed the window to the > >> few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would > >> have been much harder if it was e.g. the USB HW DMAing to memory that > >> caused the corruption, so I was lucky:-) > > > > Nice work Stephen, thanks. I will try to come up with a patch to fix this > > ASAP, along the lines of what Alan suggested. > > Stephen, Andre, > > Can you test the attached patch, please? It works for my on the Synopsys > PCIe-based FPGA board. Unfortunately my RPI board is currently broken, > so I am unable to test it there to verify it actually fixes the problem > you are seeing. > > The dwc2 driver doesn't use the usb_device toggle bits anywhere else, > so the quickest fix is to just remove the problematic code from > _dwc2_hcd_endpoint_reset(). > > If you give me your tested-bys, I will submit this as a proper patch > to Greg. I'll give it a spin this evening, thanks. Is that really just redundant code or could this removal have side effects? Should I look out for anything specific? Oh, and I'm not sure if I poked the right spot with the "nousb" fix, but I'll send that out as well. Regards, Andre -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html