[RFC 05/10] xhci: Handle command ring stop between commands.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The TLDR version: we need to turn command TRBs into a no-op command,
even if the command dequeue pointer points to them on a command ring
stop completion event.

When the xHCI host controller is in the middle of a command, and
software writes the command abort bit in the CRCR register, the host
will do four things:

1. Abort the command and stop processing the command ring
2. Queue a completion event with the completion code set to Command
   Abort to the event ring
3. Move the command ring dequeue pointer to the next TRB
4. Queue a completion event with the completion code set to Command Ring
   Stop to the event ring.

However, if the host was in between processing TRBs, it will not do step
two.  The requirement in step three is ambiguous.  If the host was
between command TRB processing, it may stop on the command we wished to
cancel, or it may move past that command, depending on the internal host
controller management of the dequeue pointer.

The current code does not handle this corner case very well:

static void handle_cmd_completion(struct xhci_hcd *xhci,
                struct xhci_event_cmd *event)
{
...
        if ((GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_ABORT) ||
                (GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_STOP)) {
                /* If the return value is 0, we think the trb pointed by
                 * command ring dequeue pointer is a good trb. The good
                 * trb means we don't want to cancel the trb, but it have
                 * been stopped by host. So we should handle it normally.
                 * Otherwise, driver should invoke inc_deq() and return.
                 */
                if (handle_stopped_cmd_ring(xhci,
                                GET_COMP_CODE(le32_to_cpu(event->status)))) {
                        inc_deq(xhci, xhci->cmd_ring);
                        return;
                }
        }
...
}

static int handle_stopped_cmd_ring(struct xhci_hcd *xhci,
                int cmd_trb_comp_code)
{
        int cur_trb_is_good = 0;

        /* Searching the cmd trb pointed by the command ring dequeue
         * pointer in command descriptor list. If it is found, free it.
         */
        cur_trb_is_good = xhci_search_cmd_trb_in_cd_list(xhci,
                        xhci->cmd_ring->dequeue);

        if (cmd_trb_comp_code == COMP_CMD_ABORT)
                xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
        else if (cmd_trb_comp_code == COMP_CMD_STOP) {
		...
		if (xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue)
			xhci_ring_cmd_db(xhci);
        }
        return cur_trb_is_good;
}

In this corner case, the first completion event we get has the command
completion code set to COMP_CMD_STOP.  The command ring dequeue pointer
in the hardware points to the command we wished to cancel (that the
hardware has not processed yet).  handle_stopped_cmd_ring() will return
true, since xhci_search_cmd_trb_in_cd_list() just canceled the command
at the command ring dequeue pointer.

However, xhci_search_cmd_trb_in_cd_list() does not turn the command into
a no-op command.  handle_stopped_cmd_ring() goes on to ring the
doorbell, which will cause the host to start processing commands.  When
that function returns, we'll increment the software's dequeue pointer,
which is no longer in sync with the hardware dequeue pointer.

This could produce some pretty disastrous results, if the hardware
tries to re-fetch DMA memory that has been freed.  As is, with a few
modifications to the xHCI driver to not ring the command doorbell after
a Set Address command, the host hardware re-executes the canceled Set
Address, causing the next Set Address command to fail:

Mar  7 12:01:47 xanatos kernel: [  126.442538] xhci_hcd 0000:00:14.0: TEST, TEST! Stalling Set Address for 60 seconds.
Mar  7 12:01:47 xanatos kernel: [  126.498291] xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
Mar  7 12:01:49 xanatos kernel: [  128.263795] Initializing USB Mass Storage driver...
Mar  7 12:01:49 xanatos kernel: [  128.263874] usb-storage 2-1:1.0: usb_probe_interface
Mar  7 12:01:49 xanatos kernel: [  128.263881] usb-storage 2-1:1.0: usb_probe_interface - got id
Mar  7 12:01:49 xanatos kernel: [  128.263987] xhci_hcd 0000:00:14.0: Set up evaluate context for LPM MEL change.
...
Mar  7 12:01:54 xanatos kernel: [  133.253622] xhci_hcd 0000:00:14.0: Timeout while waiting for evaluate context command, retry
...
Mar  7 12:02:21 xanatos kernel: [  160.027546] xhci_hcd 0000:00:14.0: Command ring:
Mar  7 12:02:21 xanatos kernel: [  160.027547] xhci_hcd 0000:00:14.0: @00000000ab5ac000 00000000 00000000 00000000 00002401
Mar  7 12:02:21 xanatos kernel: [  160.027549] xhci_hcd 0000:00:14.0: @00000000ab5ac010 b7796000 00000000 00000000 01002c01
Mar  7 12:02:21 xanatos kernel: [  160.027551] xhci_hcd 0000:00:14.0: @00000000ab5ac020 b7796000 00000000 00000000 01003001
Mar  7 12:02:21 xanatos kernel: [  160.027553] xhci_hcd 0000:00:14.0: @00000000ab5ac030 ab5ab000 00000000 00000000 01003401
Mar  7 12:02:21 xanatos kernel: [  160.027554] xhci_hcd 0000:00:14.0: @00000000ab5ac040 ab5ab000 00000000 00000000 01003401
Mar  7 12:02:21 xanatos kernel: [  160.027556] xhci_hcd 0000:00:14.0: @00000000ab5ac050 00000000 00000000 00000000 00002401
Mar  7 12:02:21 xanatos kernel: [  160.027558] xhci_hcd 0000:00:14.0: @00000000ab5ac060 b7636000 00000000 00000000 02002c01 <--- set address command
Mar  7 12:02:21 xanatos kernel: [  160.027559] xhci_hcd 0000:00:14.0: @00000000ab5ac070 ab5ab000 00000000 00000000 01003401 <--- eval context command
...
Mar  7 12:02:47 xanatos kernel: [  186.329990] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command
Mar  7 12:02:47 xanatos kernel: [  186.329996] xhci_hcd 0000:00:14.0: Abort command ring
Mar  7 12:02:47 xanatos kernel: [  186.330013] xhci_hcd 0000:00:14.0: Handle stopped command ring, dequeue = 00000000ab5ac060.
Mar  7 12:02:47 xanatos kernel: [  186.330016] xhci_hcd 0000:00:14.0: xhci_search_cmd_trb_in_cd_list canceling command 00000000ab5ac060
Mar  7 12:02:47 xanatos kernel: [  186.330018] xhci_hcd 0000:00:14.0: No more commands to cancel
Mar  7 12:02:47 xanatos kernel: [  186.330019] xhci_hcd 0000:00:14.0: // Ding dong!
Mar  7 12:02:47 xanatos kernel: [  186.330086] xhci_hcd 0000:00:14.0: Bad command completion for 00000000ab5ac060, dequeue at 00000000ab5ac070.

The last line is where we got a completion event for the command we were
trying to cancel.  Next, the host starts failing subsequent Set Address
commands because its internal state shows the device is already
addressed:

Mar  7 12:02:47 xanatos kernel: [  186.533706] xhci_hcd 0000:00:14.0: // Ding dong!
Mar  7 12:02:47 xanatos kernel: [  186.533719] xhci_hcd 0000:00:14.0: Set Address command completion for 00000000ab5ac0a0, comp code 19.
Mar  7 12:02:47 xanatos kernel: [  186.533734] xhci_hcd 0000:00:14.0: Setup ERROR: address device command for slot 2.

Fix this by turning commands into no-ops when we cancel them in
xhci_search_cmd_trb_in_cd_list().  It's okay if software moves its
internal dequeue pointer past the command, but the hardware dequeue
pointer is on that no-op'ed command, because we'll just ignore the no-op
command completion.

This patch should be backported to kernels as old as 3.0, that contain
the commit b63f4053cc8aa22a98e3f9a97845afe6c15d0a0d "xHCI: handle
command after aborting the command ring"

Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
 drivers/usb/host/xhci-ring.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 79ec0be..d59eef6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1313,6 +1313,7 @@ static int xhci_search_cmd_trb_in_cd_list(struct xhci_hcd *xhci,
 	list_for_each_entry_safe(cur_cd, next_cd,
 			&xhci->cancel_cmd_list, cancel_cmd_list) {
 		if (cur_cd->cmd_trb == cmd_trb) {
+			xhci_cmd_to_noop(xhci, cur_cd);
 			if (cur_cd->command)
 				xhci_complete_cmd_in_cmd_wait_list(xhci,
 					cur_cd->command, COMP_CMD_STOP);
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux