Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> writes: > The retry was a workaround for a case triggered on v4.1.4 by > Vincent Pelletier (added to CC) > > http://marc.info/?l=linux-usb&m=144031185222108&w=2 > > The race could very well explain that issue. >From logs of that thread, it would has possibility about this race. >> @@ -320,15 +367,29 @@ static int xhci_abort_cmd_ring(struct xh >> udelay(1000); >> ret = xhci_handshake(&xhci->op_regs->cmd_ring, >> CMD_RING_RUNNING, 0, 3 * 1000 * 1000); >> - if (ret == 0) >> - return 0; >> + if (ret < 0) { >> + xhci_err(xhci, "Stopped the command ring failed, " >> + "maybe the host is dead\n"); >> + xhci->xhc_state |= XHCI_STATE_DYING; >> + xhci_halt(xhci); >> + return -ESHUTDOWN; >> + } >> + } > > So if we can verify that the race fix solves the old issue for Vincent > Pelletier then we could get rid of the abort retry above as well. Right. However, I'm not sure (and maybe you neither). So I didn't remove it in this patchset. (After this patchset, I guess the retry will hit only when actual chip internal confusion. I.e. retry will fail, then will chip halted.) Well, anyway, if you decide to try to remove the retry, I also think it would worth to try. However, it would be [PATCH 4/4] or as another patchset. >> + /* >> + * Writing the CMD_RING_ABORT bit should cause a cmd completion event, >> + * however on some host hw the CMD_RING_RUNNING bit is correctly cleared >> + * but the completion event in never sent. Wait 2 secs (arbitrary >> + * number) to handle those cases after negation of CMD_RING_RUNNING. >> + */ > > I'm speculating a bit here, but as we now can sleep, and if we could > get rid of the abort retry couldn't we swap the order of the > xhci_handshake(CMD_RING_RUNNING) busywait and > wait_for_completion_timeout(). We could also use a standard command > timeout time while waiting for the completion > > something like: > > hci_write_64(CMD_RING_ABORT) > timed_out = wait_for_completion_timeout(XHCI_CMD_DEFAULT_TIMEOUT) > xhci_handshake(CMD_RING_RUNNING) > if (handshake fail) { > xhci_halt(), etc.. > return -ESHUTDOWN > } > if (timed out) { > xhci_cleanup_command_queue(xhci); > return > } > > It would reduce the time we spend in the xhci_handshake() busyloop BTW, busyloop removal was done by "[PATCH 3/3] ...". By [PATCH 3/3], we can simply use straight forward coding without busyloop. >> diff -puN drivers/usb/host/xhci.h~xhci-fix-abort-race2 drivers/usb/host/xhci.h >> --- xhci/drivers/usb/host/xhci.h~xhci-fix-abort-race2 2016-11-16 18:38:52.552146764 +0900 >> +++ xhci-hirofumi/drivers/usb/host/xhci.h 2016-11-16 18:38:52.554146762 +0900 >> @@ -1574,6 +1574,7 @@ struct xhci_hcd { >> struct list_head cmd_list; >> unsigned int cmd_ring_reserved_trbs; >> struct delayed_work cmd_timer; >> + struct completion stop_completion; > > Tiny thing, but maybe change stop_completion to > cmd_ring_stop_completion. to avoid mixing it with stopping host > controller, stop endpoint, stop device etc OK. -- OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html