On 05/12/2016 07:57 AM, Mathias Nyman wrote: > If commands timeout we mark them for abortion, then stop the command > ring, and turn the commands to no-ops and finally restart the command > ring. > > If the host is working properly the no-op commands will finish and > pending completions are called. > If we notice the host is failing driver clears the command ring and > completes, deletes and frees all pending commands. > > There are two separate cases reported where host is believed to work > properly but is not. In the first case we successfully stop the ring > but no abort or stop commnand ring event is ever sent and host locks up. s/commnand/command/ > > The second case is if a host is removed, command times out and driver > believes the ring is stopped, and assumes it be restarted, but actually > ends up timing out on the same command forever. > If one of the pending commands has the xhci->mutex held it will block > xhci_stop() in the remove codepath which otherwise would cleanup pending > commands. > > Add a check that clears all pending commands in case host is removed, > or we are stuck timeouting on the same command. Also restart the s/timeouting/timing out/ > command timeout timer when stopping the command ring to ensure we > recive an ring stop/abort event. s/recive/receive/ > > Cc: stable <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> > --- > drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 52deae4..c82570d 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) > > temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); > xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; > + > + /* > + * Writing the CMD_RING_ABORT bit should cause a cmd completion event, > + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared > + * but the completion event in never sent. Use the cmd timeout timer to > + * handle those cases. Use twice the time to cover the bit polling retry > + */ > + mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT)); > xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, > &xhci->op_regs->cmd_ring); > > @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci) > > xhci_err(xhci, "Stopped the command ring failed, " > "maybe the host is dead\n"); > + del_timer(&xhci->cmd_timer); > xhci->xhc_state |= XHCI_STATE_DYING; > xhci_quiesce(xhci); > xhci_halt(xhci); > @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data) > int ret; > unsigned long flags; > u64 hw_ring_state; > - struct xhci_command *cur_cmd = NULL; > + bool second_timeout = false; > xhci = (struct xhci_hcd *) data; > > /* mark this command to be cancelled */ > spin_lock_irqsave(&xhci->lock, flags); > if (xhci->current_cmd) { > - cur_cmd = xhci->current_cmd; > - cur_cmd->status = COMP_CMD_ABORT; > + if (xhci->current_cmd->status == COMP_CMD_ABORT) > + second_timeout = true; > + xhci->current_cmd->status = COMP_CMD_ABORT; > } > > - > /* Make sure command ring is running before aborting it */ > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) && > (hw_ring_state & CMD_RING_RUNNING)) { > - > spin_unlock_irqrestore(&xhci->lock, flags); > xhci_dbg(xhci, "Command timeout\n"); > ret = xhci_abort_cmd_ring(xhci); > @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data) > } > return; > } > + > + /* command ring failed to restart, or host removed. Bail out */ > + if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) { > + spin_unlock_irqrestore(&xhci->lock, flags); > + xhci_dbg(xhci, "command timed out twice, ring start fail?\n"); > + xhci_cleanup_command_queue(xhci); > + return; > + } > + > /* command timeout on stopped ring, ring can't be aborted */ > xhci_dbg(xhci, "Command timeout on stopped ring\n"); > xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > Hi Mathias, At first glance, this patch looks good. As I mentioned earlier, after applying "xhci_mem_cleanup after second hcd" a lot of issues we'd been seeing on Stratus HW cleared up. That said, I'll add this patch to my testing and report any problems. Thanks! -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html