Re: [RFC 03/10] xhci: Handle command stalls.

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Fri, 8 Mar 2013 21:15:07 -0500 (EST)

On Fri, 8 Mar 2013, Sarah Sharp wrote:

> On Fri, Mar 08, 2013 at 11:14:38AM -0500, Alan Stern wrote:
> > On Thu, 7 Mar 2013, Sarah Sharp wrote:
> > 
> > > "One Ring to rule them all, One Ring to find them,
> > > One Ring to bring them all and in the darkness bind them"
> > >  -  J. R. R. Tolkien
> > > 
> > > There is only one command ring for each xHCI host, and all commands flow
> > > through that ring.  However, when a tricksy little USB device fails to
> > > respond to the Set Address command, the entire ring hangs.
> > > 
> > > If a Stop Endpoint command had been queued at the same time, the
> > > watchdog timer will fire shortly after we start aborting the Set Address
> > > command.  The watchdog timer will assume the host is dying and kill it,
> > > disconnecting all USB devices under the host.  It's basically a USB DoS
> > > attack.
> > > 
> > > The fix is for all commands to check whether they are the currently
> > > executing command on the command ring when they timeout.  If not, they
> > > will wait on the command completion again, and the Stop Endpoint
> > > watchdog timer will re-queue itself.  If they are the currently
> > > executing command, they will cancel themselves, or the Stop Endpoint
> > > watchdog timer will assume the host is dying.
> > 
> > You might want to go farther than this.  I haven't read the driver code 
> > enough to be sure how you have this organized, but it seems logical 
> > that you shouldn't set a command's expiration time until that command 
> > begins executing.
> > 
> > Suppose the ring contains commands A and B and you set B's expiration 
> > time while A is still running.  Then A could continue running until B's 
> > time is almost up.  When B does start, the watchdog timer will expire 
> > before B has a chance to do anything.
> 
> Yes, it's true.  With this patch, the watchdog timer will re-queue
> itself, of course.

No; it will see that B is the currently executing command and either
cancel B or conclude that the host is dead.  Just like the behavior
without this patch, except that the window of opportunity is smaller.

>  I do agree that it's inefficient to have multiple
> timers running when we could only have one timer running.
> 
> > By not setting a command's expiration time until that command begins
> > running, you remove the need for the check added by this patch.  It
> > will not be possible for the command's watchdog timer to expire before
> > the command becomes the one currently executing.
> 
> In general, all the commands wait (interruptibly) on a completion, with
> a 5 second timeout.  However, the Stop Endpoint command doesn't have a
> completion, since it's part of the asynchronous call to cancel an URB.
> The Stop Endpoint command is the only one with a watchdog timer.
> 
> I started out with an idea to yours, of having a global command list.
> Commands would be queued to the FIFO, submitted to the command ring, and
> then their submitters would call wait_for_completion_interruptible
> (without a timeout), or simply return, in the case of the Stop Endpoint
> command.  The watchdog timer routine would be gutted to be a global
> command queue manager that timed the currently executing command, and
> would cancel it if necessary.
> 
> In the end, I decided it would be better for the stable kernels to keep
> to the simpler (if less efficient) solution.  But I would like to do the
> "right thing" later on.

Would it be simpler to keep the existing implementation, but start the
timer for a Stop Endpoint command in the completion handler for the
previous command?  (But then what happens if there are two Stop
Endpoint commands in a row?)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html