Hi, On 12/21/2016 08:57 PM, Mathias Nyman wrote: > On 21.12.2016 08:57, Lu Baolu wrote: >> Hi Mathias, >> >> I have some comments for the implementation of >> xhci_handle_command_timeout() as well. >> >> On 12/20/2016 11:13 PM, Mathias Nyman wrote: >>> On 20.12.2016 09:30, Baolin Wang wrote: >>> ... >>> >>> Alright, I gathered all current work related to xhci races and timeouts >>> and put them into a branch: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes >>> >>> Its based on 4.9 >>> It includes a few other patches just to avoid conflicts and make my life easier >>> >>> Interesting patches are: >>> >>> ee4eb91 xhci: remove unnecessary check for pending timer >>> 0cba67d xhci: detect stop endpoint race using pending timer instead of counter. >>> 4f2535f xhci: Handle command completion and timeout race >>> b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command >>> 529a5a0 usb: xhci: fix possible wild pointer >>> 4766555 xhci: Fix race related to abort operation >>> de834a3 xhci: Use delayed_work instead of timer for command timeout >>> 69973b8 Linux 4.9 >>> >>> The fixes for command queue races will go to usb-linus and stable, the >>> reworks for stop ep watchdog timer will go to usb-next. >>> >>> Still completely untested, (well it compiles) >>> >>> Felipe gave instructions how to modify dwc3 driver to timeout on address >>> devicecommands to test these, I'll try to set that up. >>> >>> All additional testing is welcome, especially if you can trigger timeouts >>> and races >>> >>> -Mathias >>> >>> >> >> I post the code below and add my comments in line. >> >> 1276 void xhci_handle_command_timeout(struct work_struct *work) >> 1277 { >> 1278 struct xhci_hcd *xhci; >> 1279 int ret; >> 1280 unsigned long flags; >> 1281 u64 hw_ring_state; >> 1282 >> 1283 xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer); >> 1284 >> 1285 spin_lock_irqsave(&xhci->lock, flags); >> 1286 >> 1287 /* >> 1288 * If timeout work is pending, or current_cmd is NULL, it means we >> 1289 * raced with command completion. Command is handled so just return. >> 1290 */ >> 1291 if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) { >> 1292 spin_unlock_irqrestore(&xhci->lock, flags); >> 1293 return; >> 1294 } >> 1295 /* mark this command to be cancelled */ >> 1296 xhci->current_cmd->status = COMP_CMD_ABORT; >> 1297 >> 1298 /* Make sure command ring is running before aborting it */ >> 1299 hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); >> 1300 if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) && >> 1301 (hw_ring_state & CMD_RING_RUNNING)) { >> 1302 /* Prevent new doorbell, and start command abort */ >> 1303 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; >> 1304 spin_unlock_irqrestore(&xhci->lock, flags); >> 1305 xhci_dbg(xhci, "Command timeout\n"); >> 1306 ret = xhci_abort_cmd_ring(xhci); >> 1307 if (unlikely(ret == -ESHUTDOWN)) { >> 1308 xhci_err(xhci, "Abort command ring failed\n"); >> 1309 xhci_cleanup_command_queue(xhci); >> 1310 usb_hc_died(xhci_to_hcd(xhci)->primary_hcd); >> 1311 xhci_dbg(xhci, "xHCI host controller is dead.\n"); >> 1312 } >> 1313 return; >> 1314 } >> 1315 >> 1316 /* host removed. Bail out */ >> 1317 if (xhci->xhc_state & XHCI_STATE_REMOVING) { >> 1318 spin_unlock_irqrestore(&xhci->lock, flags); >> 1319 xhci_dbg(xhci, "host removed, ring start fail?\n"); >> 1320 xhci_cleanup_command_queue(xhci); >> 1321 return; >> 1322 } >> >> I think this part of code should be moved up to line 1295. > > The XHCI_STATE_REMOVING and XHCI_STATE_DYING needs a rework, > I'm working on that. > > Basically we want XHCI_STATE_REMOVING to mean that all devices are going, > away and driver will be removed. Don't bother with re-calculating available > bandwidths after every device removal, but do use xhci hardware to disable > devices cleanly etc. > > XHCI_STATE_DYING should mean hardware is not working/responding. Don't > bother writing any registers or queuing anything. Just return all > pending and cancelled URBs, notify core we died, and free all allocated memory. Okay, thanks for the information. > >> >> 1323 >> 1324 /* command timeout on stopped ring, ring can't be aborted */ >> 1325 xhci_dbg(xhci, "Command timeout on stopped ring\n"); >> 1326 xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd); >> 1327 spin_unlock_irqrestore(&xhci->lock, flags); >> >> This part of code is tricky. I have no idea about in which case should this >> code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring() >> here, right? >> > > This isn't changed it these patches. > > It will remove the aborted commands and restart the ring. It's useful if we > want to abort a command but command ring was not running. (if for some > unkown reason it was stopped, or forgot to restart. Make sense. So how about put a warning (instead of a debug message which will normally be ignored) here? Best regards, Lu Baolu -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html