Re: [PATCH 2/3 v2] xhci: Handle canceled URBs when HC dies.

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Tue, 29 Sep 2009 13:55:45 -0700

On Tue, Sep 29, 2009 at 12:45:03PM -0400, Alan Stern wrote:
> On Tue, 29 Sep 2009, David Vrabel wrote:
> 
> > Sarah Sharp wrote:
> > > When the host controller dies (e.g. it is removed from a PCI card slot),
> > > the xHCI driver cannot expect commands to complete.  The buggy code this
> > > patch fixes would mark an URB as canceled and then expect the URB to be
> > > completed when the stop endpoint command completed.  That would never
> > > happen if the host controller was dead, so the USB core would just hang in
> > > the disconnect code.
> > > 
> > > If the host controller died, and the driver asks to cancel an URB, free
> > > any structures associated with that URB and immediately give it back.
> > > 
> > [...]
> > > +	temp = xhci_readl(xhci, &xhci->op_regs->status);
> > > +	if (temp == 0xffffffff) {
> > > +		xhci_dbg(xhci, "HW died, freeing TD.\n");
> > 
> > Is this test sufficient?  What if the hardware is non-responsive but 
> > still present on the bus?
> >
> > Does the cancel request have a cancel response/ack from the hardware and 
> > if so can you add a timer while waiting for this?

If the hardware is still present, the cancellation handler queues a stop
endpoint command to the hardware's command ring and returns.  In the
ideal case, the host processes the command, places a command completion
event on the event ring, and sends an interrupt.  There's no good way to
tell at the time that I queue the command that the hardware is
responsive.  If the hardware never interrupts to complete the command,
the URB will never be given back.

So yes, this is a problem.  I'll have to think more about how to
integrate a watchdog timer in the code for the case where the hardware
is not responding to a stop endpoint command.

> > If the timer expires the hcd would run through all queues and
> > complete (with an error) all urbs.

Should it cancel all the URBs?  The hardware could be completing URBs
for other endpoints in the system, since each endpoint has its own
transfer ring.  If the hardware isn't able to stop one endpoint in the
system but other endpoints are working, should we assume the HC is just
broken, halt and reset it, and complete all URBs with an error?

> David is right.  This isn't the way dead host controllers are handled 
> in the other HCDs.
> 
> The drivers test at some appropriate point for whether the controller
> is still connected and alive (generally from within the IRQ handler,
> the resume routine, and/or a watchdog timer routine).  If it isn't, the
> driver sets hcd->state to HC_STATE_HALT (to tell usbcore that the
> controller is dead), possibly calls usb_hc_died(), resets the
> controller hardware, and sets a private flag.

The documentation for usb_hc_died() says it's only for host controllers
attached to non-PCI buses, so I don't think that applies to xHCI.  I'll
rework this patch to set hcd->state to HC_STATE_HALT.

I wish there was some documentation on when a host controller is
supposed to set hcd->state.  I mostly don't touch it at all, so are
there any other conditions where I should change it?

> Routines involved in unlinking and giving back URBs check the private 
> flag.  If it is set then they bypass the hardware and carry out the 
> software part of their job immediately.

The xHCI routines to give back URBs are only called when the hardware
interrupts with an event on the event ring.  xHCI doesn't periodically
scan the frame list like EHCI.  If the timer runs, it's going to have to
give back the URBs itself.

Sarah
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html