> -----Original Message----- > From: Joe Lawrence [mailto:joe.lawrence@xxxxxxxxxxx] > Sent: Wednesday, May 25, 2016 9:04 PM > To: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> > Cc: linux-usb@xxxxxxxxxxxxxxx; derek.shute@xxxxxxxxxxx; Rajesh Bhagat > <rajesh.bhagat@xxxxxxx>; stable <stable@xxxxxxxxxxxxxxx> > Subject: Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird > states. > > On 05/12/2016 07:57 AM, Mathias Nyman wrote: > > If commands timeout we mark them for abortion, then stop the command > > ring, and turn the commands to no-ops and finally restart the command > > ring. > > > > If the host is working properly the no-op commands will finish and > > pending completions are called. > > If we notice the host is failing driver clears the command ring and > > completes, deletes and frees all pending commands. > > > > There are two separate cases reported where host is believed to work > > properly but is not. In the first case we successfully stop the ring > > but no abort or stop commnand ring event is ever sent and host locks up. > > > > The second case is if a host is removed, command times out and driver > > believes the ring is stopped, and assumes it be restarted, but > > actually ends up timing out on the same command forever. > > If one of the pending commands has the xhci->mutex held it will block > > xhci_stop() in the remove codepath which otherwise would cleanup > > pending commands. > > > > Add a check that clears all pending commands in case host is removed, > > or we are stuck timeouting on the same command. Also restart the > > command timeout timer when stopping the command ring to ensure we > > recive an ring stop/abort event. > > > > Cc: stable <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> > > --- > > drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++----- > > 1 file changed, 22 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/usb/host/xhci-ring.c > > b/drivers/usb/host/xhci-ring.c index 52deae4..c82570d 100644 > > --- a/drivers/usb/host/xhci-ring.c > > +++ b/drivers/usb/host/xhci-ring.c > > @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd > > *xhci) > > > > temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); > > xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; > > + > > + /* > > + * Writing the CMD_RING_ABORT bit should cause a cmd completion event, > > + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared > > + * but the completion event in never sent. Use the cmd timeout timer to > > + * handle those cases. Use twice the time to cover the bit polling retry > > + */ > > + mod_timer(&xhci->cmd_timer, jiffies + (2 * > > +XHCI_CMD_DEFAULT_TIMEOUT)); > > xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, > > &xhci->op_regs->cmd_ring); > > > > @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd > > *xhci) > > > > xhci_err(xhci, "Stopped the command ring failed, " > > "maybe the host is dead\n"); > > + del_timer(&xhci->cmd_timer); > > xhci->xhc_state |= XHCI_STATE_DYING; > > xhci_quiesce(xhci); > > xhci_halt(xhci); > > @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long > data) > > int ret; > > unsigned long flags; > > u64 hw_ring_state; > > - struct xhci_command *cur_cmd = NULL; > > + bool second_timeout = false; > > xhci = (struct xhci_hcd *) data; > > > > /* mark this command to be cancelled */ > > spin_lock_irqsave(&xhci->lock, flags); > > if (xhci->current_cmd) { > > - cur_cmd = xhci->current_cmd; > > - cur_cmd->status = COMP_CMD_ABORT; > > + if (xhci->current_cmd->status == COMP_CMD_ABORT) > > + second_timeout = true; > > + xhci->current_cmd->status = COMP_CMD_ABORT; > > } > > > > - > > /* Make sure command ring is running before aborting it */ > > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); > > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) && > > (hw_ring_state & CMD_RING_RUNNING)) { > > - > > spin_unlock_irqrestore(&xhci->lock, flags); > > xhci_dbg(xhci, "Command timeout\n"); > > ret = xhci_abort_cmd_ring(xhci); > > @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long > data) > > } > > return; > > } > > + > > + /* command ring failed to restart, or host removed. Bail out */ > > + if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) { > > + spin_unlock_irqrestore(&xhci->lock, flags); > > + xhci_dbg(xhci, "command timed out twice, ring start fail?\n"); > > + xhci_cleanup_command_queue(xhci); > > + return; > > + } > > + > > /* command timeout on stopped ring, ring can't be aborted */ > > xhci_dbg(xhci, "Command timeout on stopped ring\n"); > > xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); > > > Hello Mathias, I'm using kernel version 4.1.8, and this patch applies OK but fails in compilation due to missing definition of 'XHCI_STATE_REMOVING'. Can you help me list the dependent patches that can be applied to 4.1.8 to make this patch work. Best Regards, Rajesh Bhagat > So far so good, feel free to add my Tested-by tag. > > -- Joe -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html