RE: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird states.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Joe Lawrence [mailto:joe.lawrence@xxxxxxxxxxx]
> Sent: Wednesday, May 25, 2016 9:04 PM
> To: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
> Cc: linux-usb@xxxxxxxxxxxxxxx; derek.shute@xxxxxxxxxxx; Rajesh Bhagat
> <rajesh.bhagat@xxxxxxx>; stable <stable@xxxxxxxxxxxxxxx>
> Subject: Re: [RFT PATCH] xhci: Fix handling timeouted commands on hosts in weird
> states.
> 
> On 05/12/2016 07:57 AM, Mathias Nyman wrote:
> > If commands timeout we mark them for abortion, then stop the command
> > ring, and turn the commands to no-ops and finally restart the command
> > ring.
> >
> > If the host is working properly the no-op commands will finish and
> > pending completions are called.
> > If we notice the host is failing driver clears the command ring and
> > completes, deletes and frees all pending commands.
> >
> > There are two separate cases reported where host is believed to work
> > properly but is not. In the first case we successfully stop the ring
> > but no abort or stop commnand ring event is ever sent and host locks up.
> >
> > The second case is if a host is removed, command times out and driver
> > believes the ring is stopped, and assumes it be restarted, but
> > actually ends up timing out on the same command forever.
> > If one of the pending commands has the xhci->mutex held it will block
> > xhci_stop() in the remove codepath which otherwise would cleanup
> > pending commands.
> >
> > Add a check that clears all pending commands in case host is removed,
> > or we are stuck timeouting on the same command. Also restart the
> > command timeout timer when stopping the command ring to ensure we
> > recive an ring stop/abort event.
> >
> > Cc: stable <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
> > ---
> >  drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++-----
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci-ring.c
> > b/drivers/usb/host/xhci-ring.c index 52deae4..c82570d 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -290,6 +290,14 @@ static int xhci_abort_cmd_ring(struct xhci_hcd
> > *xhci)
> >
> >  	temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> >  	xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
> > +
> > +	/*
> > +	 * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
> > +	 * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
> > +	 * but the completion event in never sent. Use the cmd timeout timer to
> > +	 * handle those cases. Use twice the time to cover the bit polling retry
> > +	 */
> > +	mod_timer(&xhci->cmd_timer, jiffies + (2 *
> > +XHCI_CMD_DEFAULT_TIMEOUT));
> >  	xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
> >  			&xhci->op_regs->cmd_ring);
> >
> > @@ -314,6 +322,7 @@ static int xhci_abort_cmd_ring(struct xhci_hcd
> > *xhci)
> >
> >  		xhci_err(xhci, "Stopped the command ring failed, "
> >  				"maybe the host is dead\n");
> > +		del_timer(&xhci->cmd_timer);
> >  		xhci->xhc_state |= XHCI_STATE_DYING;
> >  		xhci_quiesce(xhci);
> >  		xhci_halt(xhci);
> > @@ -1246,22 +1255,21 @@ void xhci_handle_command_timeout(unsigned long
> data)
> >  	int ret;
> >  	unsigned long flags;
> >  	u64 hw_ring_state;
> > -	struct xhci_command *cur_cmd = NULL;
> > +	bool second_timeout = false;
> >  	xhci = (struct xhci_hcd *) data;
> >
> >  	/* mark this command to be cancelled */
> >  	spin_lock_irqsave(&xhci->lock, flags);
> >  	if (xhci->current_cmd) {
> > -		cur_cmd = xhci->current_cmd;
> > -		cur_cmd->status = COMP_CMD_ABORT;
> > +		if (xhci->current_cmd->status == COMP_CMD_ABORT)
> > +			second_timeout = true;
> > +		xhci->current_cmd->status = COMP_CMD_ABORT;
> >  	}
> >
> > -
> >  	/* Make sure command ring is running before aborting it */
> >  	hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> >  	if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> >  	    (hw_ring_state & CMD_RING_RUNNING))  {
> > -
> >  		spin_unlock_irqrestore(&xhci->lock, flags);
> >  		xhci_dbg(xhci, "Command timeout\n");
> >  		ret = xhci_abort_cmd_ring(xhci);
> > @@ -1273,6 +1281,15 @@ void xhci_handle_command_timeout(unsigned long
> data)
> >  		}
> >  		return;
> >  	}
> > +
> > +	/* command ring failed to restart, or host removed. Bail out */
> > +	if (second_timeout || xhci->xhc_state & XHCI_STATE_REMOVING) {
> > +		spin_unlock_irqrestore(&xhci->lock, flags);
> > +		xhci_dbg(xhci, "command timed out twice, ring start fail?\n");
> > +		xhci_cleanup_command_queue(xhci);
> > +		return;
> > +	}
> > +
> >  	/* command timeout on stopped ring, ring can't be aborted */
> >  	xhci_dbg(xhci, "Command timeout on stopped ring\n");
> >  	xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
> >
> 

Hello Mathias, 

I'm using kernel version 4.1.8, and this patch applies OK but fails in compilation 
due to missing definition of 'XHCI_STATE_REMOVING'. 

Can you help me list the dependent patches that can be applied to 4.1.8 to 
make this patch work. 

Best Regards,
Rajesh Bhagat 

> So far so good, feel free to add my Tested-by tag.
> 
> -- Joe
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]