Re: [PATCH 2/3 v2] xhci: Fix race related to abort operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes:

> The retry was a workaround for a case triggered on v4.1.4 by
> Vincent Pelletier (added to CC)
>
>    http://marc.info/?l=linux-usb&m=144031185222108&w=2
>
> The race could very well explain that issue.

>From logs of that thread, it would has possibility about this race.

>> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh
>>   		udelay(1000);
>>   		ret = xhci_handshake(&xhci->op_regs->cmd_ring,
>>   				     CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
>> -		if (ret == 0)
>> -			return 0;
>> +		if (ret < 0) {
>> +			xhci_err(xhci, "Stopped the command ring failed, "
>> +				 "maybe the host is dead\n");
>> +			xhci->xhc_state |= XHCI_STATE_DYING;
>> +			xhci_halt(xhci);
>> +			return -ESHUTDOWN;
>> +		}
>> +	}
>
> So if we can verify that the race fix solves the old issue for Vincent
> Pelletier then we could get rid of the abort retry above as well.

Right. However, I'm not sure (and maybe you neither).  So I didn't
remove it in this patchset.

(After this patchset, I guess the retry will hit only when actual chip
internal confusion. I.e. retry will fail, then will chip halted.)

Well, anyway, if you decide to try to remove the retry, I also think it
would worth to try. However, it would be [PATCH 4/4] or as another
patchset.

>> +	/*
>> +	 * Writing the CMD_RING_ABORT bit should cause a cmd completion event,
>> +	 * however on some host hw the CMD_RING_RUNNING bit is correctly cleared
>> +	 * but the completion event in never sent. Wait 2 secs (arbitrary
>> +	 * number) to handle those cases after negation of CMD_RING_RUNNING.
>> +	 */
>
> I'm speculating a bit here, but as we now can sleep, and if we could
> get rid of the abort retry couldn't we swap the order of the
> xhci_handshake(CMD_RING_RUNNING) busywait and
> wait_for_completion_timeout(). We could also use a standard command
> timeout time while waiting for the completion
>
> something like:
>
> hci_write_64(CMD_RING_ABORT)
> timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT)
> xhci_handshake(CMD_RING_RUNNING)
> if (handshake fail) {
>     xhci_halt(), etc..
>    return -ESHUTDOWN
> }	
> if (timed out) {
>    xhci_cleanup_command_queue(xhci);
>    return
> }
>
> It would reduce the time we spend in the xhci_handshake() busyloop

BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we
can simply use straight forward coding without busyloop.

>> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2 drivers/usb/host/xhci.h
>> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race2	2016-11-16 18:38:52.552146764 +0900
>> +++ xhci-hirofumi/drivers/usb/host/xhci.h	2016-11-16 18:38:52.554146762 +0900
>> @@ -1574,6 +1574,7 @@ struct xhci_hcd {
>>   	struct list_head        cmd_list;
>>   	unsigned int		cmd_ring_reserved_trbs;
>>   	struct delayed_work	cmd_timer;
>> +	struct completion	stop_completion;
>
> Tiny thing, but maybe change stop_completion to
> cmd_ring_stop_completion.  to avoid mixing it with stopping host
> controller, stop endpoint, stop device etc

OK.
-- 
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux