Re: [PATCH 2/2] usb: host: xhci: Handle the right timeout command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.12.2016 08:57, Lu Baolu wrote:
Hi Mathias,

I have some comments for the implementation of
xhci_handle_command_timeout() as well.

On 12/20/2016 11:13 PM, Mathias Nyman wrote:
On 20.12.2016 09:30, Baolin Wang wrote:
...

Alright, I gathered all current work related to xhci races and timeouts
and put them into a branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git timeout_race_fixes

Its based on 4.9
It includes a few other patches just to avoid conflicts and  make my life easier

Interesting patches are:

ee4eb91 xhci: remove unnecessary check for pending timer
0cba67d xhci: detect stop endpoint race using pending timer instead of counter.
4f2535f xhci: Handle command completion and timeout race
b9d00d7 usb: host: xhci: Fix possible wild pointer when handling abort command
529a5a0 usb: xhci: fix possible wild pointer
4766555 xhci: Fix race related to abort operation
de834a3 xhci: Use delayed_work instead of timer for command timeout
69973b8 Linux 4.9

The fixes for command queue races will go to usb-linus and stable, the
reworks for stop ep watchdog timer will go to usb-next.

Still completely untested, (well it compiles)

Felipe gave instructions how to modify dwc3 driver to timeout on address
devicecommands to test these, I'll try to set that up.

All additional testing is welcome, especially if you can trigger timeouts
and races

-Mathias



I post the code below and add my comments in line.

1276 void xhci_handle_command_timeout(struct work_struct *work)
1277 {
1278         struct xhci_hcd *xhci;
1279         int ret;
1280         unsigned long flags;
1281         u64 hw_ring_state;
1282
1283         xhci = container_of(to_delayed_work(work), struct xhci_hcd, cmd_timer);
1284
1285         spin_lock_irqsave(&xhci->lock, flags);
1286
1287         /*
1288          * If timeout work is pending, or current_cmd is NULL, it means we
1289          * raced with command completion. Command is handled so just return.
1290          */
1291         if (!xhci->current_cmd || delayed_work_pending(&xhci->cmd_timer)) {
1292                 spin_unlock_irqrestore(&xhci->lock, flags);
1293                 return;
1294         }
1295         /* mark this command to be cancelled */
1296         xhci->current_cmd->status = COMP_CMD_ABORT;
1297
1298         /* Make sure command ring is running before aborting it */
1299         hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
1300         if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
1301             (hw_ring_state & CMD_RING_RUNNING))  {
1302                 /* Prevent new doorbell, and start command abort */
1303                 xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
1304                 spin_unlock_irqrestore(&xhci->lock, flags);
1305                 xhci_dbg(xhci, "Command timeout\n");
1306                 ret = xhci_abort_cmd_ring(xhci);
1307                 if (unlikely(ret == -ESHUTDOWN)) {
1308                         xhci_err(xhci, "Abort command ring failed\n");
1309                         xhci_cleanup_command_queue(xhci);
1310                         usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
1311                         xhci_dbg(xhci, "xHCI host controller is dead.\n");
1312                 }
1313                 return;
1314         }
1315
1316         /* host removed. Bail out */
1317         if (xhci->xhc_state & XHCI_STATE_REMOVING) {
1318                 spin_unlock_irqrestore(&xhci->lock, flags);
1319                 xhci_dbg(xhci, "host removed, ring start fail?\n");
1320                 xhci_cleanup_command_queue(xhci);
1321                 return;
1322         }

I think this part of code should be moved up to line 1295.

The XHCI_STATE_REMOVING and XHCI_STATE_DYING needs a rework,
I'm working on that.

Basically we want XHCI_STATE_REMOVING to mean that  all devices are going,
away and driver will be removed. Don't bother with re-calculating available
bandwidths after every device removal, but do use xhci hardware to disable
devices cleanly etc.

XHCI_STATE_DYING should mean hardware is not working/responding. Don't
bother writing any registers or queuing anything. Just return all
pending and cancelled URBs, notify core we died, and free all allocated memory.


1323
1324         /* command timeout on stopped ring, ring can't be aborted */
1325         xhci_dbg(xhci, "Command timeout on stopped ring\n");
1326         xhci_handle_stopped_cmd_ring(xhci, xhci->current_cmd);
1327         spin_unlock_irqrestore(&xhci->lock, flags);

This part of code is tricky. I have no idea about in which case should this
code be executed? Anyway, we shouldn't call xhci_handle_stopped_cmd_ring()
here, right?


This isn't changed it these patches.

It will remove the aborted commands and restart the ring. It's useful if we
want to abort a command but command ring was not running. (if for some
unkown reason it was stopped, or forgot to restart.

-Mathias


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux