Hi Alan, On Sat, Dec 08, 2012 at 10:54:00PM -0500, Alan Stern wrote: [...] > > Is this the reason why the "ehci", after the problem > > is triggered, it does not work anymore? > > Yes. Actually, it might have started working again if you unplugged > all your high-speed devices and then plugged them back in. Maybe. last time I tried, it did not work. The ehci was dead and kept staying that way. > > One more question, why the heavy traffic triggers it? > > Could it be the controller is too busy and it does not > > answer (within 20ms) to the request? > > It's hard to say. At the moment the controller was told to turn off > the schedule, it was not under heavy load. This is because the > schedule gets turned off when there has been no traffic at all (no QH's > in the schedule) for at least 15 ms. > > On the other hand, we know that new traffic did get started before the > schedule actually turned off. So maybe the new load caused the > controller to be too busy -- we don't know how much time passed between > the "turn off" command and the start of the new traffic. I could write > a patch to find out... It might be useful. I tried quickly to put 5 instead of 20, but still the issue was difficult to reproduce (3.7.0-rc4+). It happened (again) only after I increased the CPU (or I/O) load by "make -j5" in "/usr/src/linux". And even not immediately. Of course, it was just a quick test, it could have been I made some mistakes. Or not. > Here's an idea. This just occurred to me. Maybe when the driver is > waiting for the async schedule to turn off, new QH's should not be > added to the schedule. The driver could wait and add them after the > schedule was off. I didn't do it that way because it would slow things > down and add complexity, but maybe that's what the nVidia hardware > needs. How difficult is it? Would it be possible to have as a patch I could try? > > More of that, is it "sane" to just increase a timeout > > in order to workaround the issue? > > I don't know. If a small increase in the timeout fixes the problem > then maybe it is. The problem is that I don't understand exactly what > causes the bug, so I can't tell the right way to work around it. Maybe more verbosity is needed somewhere? > > Would it be better, after the timeout, to re-try to turn > > on the async schedule for a couple of times? With some > > wait inbetween, of course. > > I doubt that would work. It would be better to make the timeout > longer. What I mean is that instead of waiting for 500ms at once, it could be better to wait 50 times 10ms. As soon as the switch occurs, the system can just continue. In other words to have a flexible timeout, instead of a fixed one. Assuming it is realiably possible to verify the status of the EHCI hardware. bye, -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html