Re: USB issue with kernel 3.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 8 Dec 2012, Piergiorgio Sartor wrote:

> Hi Alan,
> 
> the output of lspci:
> 
> 00:0b.1 USB Controller: nVidia Corporation MCP51 USB Controller (rev a3) (prog-if 20 [EHCI])
> 	Subsystem: ASUSTeK Computer Inc. Device 81c0
> 	Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22
> 	Memory at fe02e000 (32-bit, non-prefetchable) [size=256]
> 	Capabilities: [44] Debug port: BAR=1 offset=0098
> 	Capabilities: [80] Power Management version 2
> 	Kernel driver in use: ehci_hcd

nVidia, eh?  Their stuff does have some weird bugs...

> The output of "dmesg" with your patch:
> 
> [  773.387144] ehci_hcd 0000:00:0b.1: Waited too long for the async schedule status (0/8000), giving up

Believe it or not, this one line explains everything -- and the rest of
the debugging log confirms it.  This didn't show up in your earlier
kernel log because this message wasn't added until the 3.6 release, and
your earlier test used 3.5.

Basically, this is a bug in nVidia's EHCI controller hardware.  The
driver told the controller to turn off its async schedule, and 20 ms
later the schedule still was running.  Although the EHCI specification
doesn't put any time limit on how long turning off the schedule may
take, in practice it shouldn't be any more than a couple of ms (and
normally much less).

But then the driver needed to turn the schedule back on.  The command
to do so was ignored by the controller, since it was still trying to
carry out the earlier command to turn off the schedule.  Eventually --
no way to know exactly when -- the schedule _did_ turn off.  And then
it never turned back on!

That's why none of the transfers from that point on were successful.

> [  803.804848] ehci_hcd 0000:00:0b.1: alan start  cur time 4295471040 last scan 4295440395
> [  803.804862] ehci_hcd 0000:00:0b.1: command 10035 cmd reg 10035 io watchdog 1 async count 10

The "Async Schedule Enable" bit is bit 0x20 in the command register;  
the debugging output shows that it is turned on.  Nevertheless the 
async schedule isn't running, as shown by the rest of the debugging 
output.

> One comment, I was using, apparently, kernel 3.7.0-rc4+

It's a good thing you did!

> and it was quite difficult to reproduce the error.
> It happened only after I increased the CPU (and I/O) load
> doing a "make clean" in /usr/src/linux

3.7 waits longer for the schedule to turn off before giving up than 3.5
does; that probably explains why the bug was harder to trigger.

The question now is what to do about this.  I suppose the waiting time 
could be increased -- but how much?

You can experiment with different values easily enough.  In the source
file drivers/usb/host/ehci-timer.c, in the ehci_poll_ASS() function
near line 117, you'll see this:

		/* Poll again later, but give up after about 20 ms */
		if (ehci->ASS_poll_count++ < 20) {
			ehci_enable_event(ehci, EHCI_HRTIMER_POLL_ASS, true);
			return;
		}
		ehci_dbg(ehci, "Waited too long for the async schedule status (%x/%x), giving up\n",
				want, actual);

Change the 20's to something bigger, like 50 or 100, and see if that
makes the problem go away.  (For thoroughness, also try changing them
to something smaller, like 5, and see if that makes the problem occur
more readily.)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux