On Fri, Mar 08, 2013 at 11:14:38AM -0500, Alan Stern wrote: > On Thu, 7 Mar 2013, Sarah Sharp wrote: > > > "One Ring to rule them all, One Ring to find them, > > One Ring to bring them all and in the darkness bind them" > > - J. R. R. Tolkien > > > > There is only one command ring for each xHCI host, and all commands flow > > through that ring. However, when a tricksy little USB device fails to > > respond to the Set Address command, the entire ring hangs. > > > > If a Stop Endpoint command had been queued at the same time, the > > watchdog timer will fire shortly after we start aborting the Set Address > > command. The watchdog timer will assume the host is dying and kill it, > > disconnecting all USB devices under the host. It's basically a USB DoS > > attack. > > > > The fix is for all commands to check whether they are the currently > > executing command on the command ring when they timeout. If not, they > > will wait on the command completion again, and the Stop Endpoint > > watchdog timer will re-queue itself. If they are the currently > > executing command, they will cancel themselves, or the Stop Endpoint > > watchdog timer will assume the host is dying. > > You might want to go farther than this. I haven't read the driver code > enough to be sure how you have this organized, but it seems logical > that you shouldn't set a command's expiration time until that command > begins executing. > > Suppose the ring contains commands A and B and you set B's expiration > time while A is still running. Then A could continue running until B's > time is almost up. When B does start, the watchdog timer will expire > before B has a chance to do anything. Yes, it's true. With this patch, the watchdog timer will re-queue itself, of course. I do agree that it's inefficient to have multiple timers running when we could only have one timer running. > By not setting a command's expiration time until that command begins > running, you remove the need for the check added by this patch. It > will not be possible for the command's watchdog timer to expire before > the command becomes the one currently executing. In general, all the commands wait (interruptibly) on a completion, with a 5 second timeout. However, the Stop Endpoint command doesn't have a completion, since it's part of the asynchronous call to cancel an URB. The Stop Endpoint command is the only one with a watchdog timer. I started out with an idea to yours, of having a global command list. Commands would be queued to the FIFO, submitted to the command ring, and then their submitters would call wait_for_completion_interruptible (without a timeout), or simply return, in the case of the Stop Endpoint command. The watchdog timer routine would be gutted to be a global command queue manager that timed the currently executing command, and would cancel it if necessary. In the end, I decided it would be better for the stable kernels to keep to the simpler (if less efficient) solution. But I would like to do the "right thing" later on. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html