Re: USB issue with kernel 3.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 9 Dec 2012, Piergiorgio Sartor wrote:

> > Basically, this is a bug in nVidia's EHCI controller hardware.  The
> > driver told the controller to turn off its async schedule, and 20 ms
> > later the schedule still was running.  Although the EHCI specification
> > doesn't put any time limit on how long turning off the schedule may
> > take, in practice it shouldn't be any more than a couple of ms (and
> > normally much less).
> 
> Question is, why it was working before kernel 3.6.0?

It probably was not working.  You just never encountered the right 
conditions to trigger the bug.

> Or, at least, why it did not show up.
> We know the patch triggering it (it seems), what was
> done before instead of that?

Well, there was one significant difference.  The async schedule gets
stopped after all the async QH's have been removed from the schedule
(they get removed after they have been idle for at least 6 ms).  The
old code removed them one at a time, whereas the new code can remove a
bunch of them at once.

Maybe the one-at-a-time removal slowed things down enough so that the
QH's never were all removed.  Since you're testing with ten disk
drives, you've got 20 QH's (one IN and one OUT for each drive).  The
old code could take up to 20 times longer to remove all of them than
the new code does.  During that extra time, some of the QH's might
become active again, which would prevent the driver from stopping the 
schedule.

Of course, this is just a guess.

> > But then the driver needed to turn the schedule back on.  The command
> > to do so was ignored by the controller, since it was still trying to
> > carry out the earlier command to turn off the schedule.  Eventually --
> > no way to know exactly when -- the schedule _did_ turn off.  And then
> > it never turned back on!
> 
> Is this the reason why the "ehci", after the problem
> is triggered, it does not work anymore?

Yes.  Actually, it might have started working again if you unplugged 
all your high-speed devices and then plugged them back in.  Maybe.

> One more question, why the heavy traffic triggers it?
> Could it be the controller is too busy and it does not
> answer (within 20ms) to the request?

It's hard to say.  At the moment the controller was told to turn off
the schedule, it was not under heavy load.  This is because the
schedule gets turned off when there has been no traffic at all (no QH's
in the schedule) for at least 15 ms.

On the other hand, we know that new traffic did get started before the
schedule actually turned off.  So maybe the new load caused the
controller to be too busy -- we don't know how much time passed between
the "turn off" command and the start of the new traffic.  I could write
a patch to find out...

Here's an idea.  This just occurred to me.  Maybe when the driver is
waiting for the async schedule to turn off, new QH's should not be
added to the schedule.  The driver could wait and add them after the
schedule was off.  I didn't do it that way because it would slow things
down and add complexity, but maybe that's what the nVidia hardware
needs.

> > The question now is what to do about this.  I suppose the waiting time 
> > could be increased -- but how much?
> 
> More of that, is it "sane" to just increase a timeout
> in order to workaround the issue?

I don't know.  If a small increase in the timeout fixes the problem
then maybe it is.  The problem is that I don't understand exactly what
causes the bug, so I can't tell the right way to work around it.

> Again, I think the change in the code that caused this
> to show up must be somehow reconsidered, or not?

At this point, I don't think so.  Other parts of the code may need to 
be changed, though.

> Would it be better, after the timeout, to re-try to turn
> on the async schedule for a couple of times? With some
> wait inbetween, of course.

I doubt that would work.  It would be better to make the timeout 
longer.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux