Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/12/2016 07:57 AM, Mathias Nyman wrote:
> If commands timeout we mark them for abortion, then stop the command
> ring, and turn the commands to no-ops and finally restart the command
> ring.
> 
> If the host is working properly the no-op commands will finish and
> pending completions are called.
> If we notice the host is failing driver clears the command ring and
> completes, deletes and frees all pending commands.
> 
> There are two separate cases reported where host is believed to work
> properly but is not. In the first case we successfully stop the ring
> but no abort or stop commnand ring event is ever sent and host locks up.

s/commnand/command/

> 
> The second case is if a host is removed, command times out and driver
> believes the ring is stopped, and assumes it be restarted, but actually
> ends up timing out on the same command forever.
> If one of the pending commands has the xhci->mutex held it will block
> xhci_stop() in the remove codepath which otherwise would cleanup pending
> commands.
> 
> Add a check that clears all pending commands in case host is removed,
> or we are stuck timeouting on the same command. Also restart the

s/timeouting/timing out/

> command timeout timer when stopping the command ring to ensure we
> recive an ring stop/abort event.

s/recive/receive/

> 
> Cc: stable <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
> ---
>  drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 52deae4..c82570d 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>  
>  	temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>  	xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
> +
> +	/*
> +	 * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
> +	 * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
> +	 * but the completion event in never sent. Use the cmd timeout timer to
> +	 * handle those cases. Use twice the time to cover the bit polling retry
> +	 */
> +	mod_timer(&xhci->cmd_timer, jiffies + (2 * XHCI_CMD_DEFAULT_TIMEOUT));
>  	xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>  			&xhci->op_regs->cmd_ring);
>  
> @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>  
>  		xhci_err(xhci, "Stopped the command ring failed, "
>  				"maybe the host is dead\n");
> +		del_timer(&xhci->cmd_timer);
>  		xhci->xhc_state |= XHCI_STATE_DYING;
>  		xhci_quiesce(xhci);
>  		xhci_halt(xhci);
> @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long data)
>  	int ret;
>  	unsigned long flags;
>  	u64 hw_ring_state;
> -	struct xhci_command *cur_cmd = NULL;
> +	bool second_timeout = false;
>  	xhci = (struct xhci_hcd *) data;
>  
>  	/* mark this command to be cancelled */
>  	spin_lock_irqsave(&xhci->lock, flags);
>  	if (xhci->current_cmd) {
> -		cur_cmd = xhci->current_cmd;
> -		cur_cmd->status = COMP_CMD_ABORT;
> +		if (xhci->current_cmd->status == COMP_CMD_ABORT)
> +			second_timeout = true;
> +		xhci->current_cmd->status = COMP_CMD_ABORT;
>  	}
>  
> -
>  	/* Make sure command ring is running before aborting it */
>  	hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>  	if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
>  	    (hw_ring_state & CMD_RING_RUNNING))  {
> -
>  		spin_unlock_irqrestore(&xhci->lock, flags);
>  		xhci_dbg(xhci, "Command timeout\n");
>  		ret = xhci_abort_cmd_ring(xhci);
> @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long data)
>  		}
>  		return;
>  	}
> +
> +	/* command ring failed to restart, or host removed. Bail out */
> +	if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
> +		spin_unlock_irqrestore(&xhci->lock, flags);
> +		xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
> +		xhci_cleanup_command_queue(xhci);
> +		return;
> +	}
> +
>  	/* command timeout on stopped ring, ring can't be aborted */
>  	xhci_dbg(xhci, "Command timeout on stopped ring\n");
>  	xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> 

Hi Mathias,

At first glance, this patch looks good.  As I mentioned earlier, after
applying "xhci_mem_cleanup after second hcd" a lot of issues we'd been
seeing on Stratus HW cleared up.  That said, I'll add this patch to my
testing and report any problems.

Thanks!

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux