Re: xhci_handle_command_timeout and wait_for_completion

Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> · Wed, 11 May 2016 14:39:32 +0300

On 10.05.2016 17:04, Joe Lawrence wrote:
On 05/09/2016 06:18 AM, Mathias Nyman wrote:
On 06.05.2016 23:32, Joe Lawrence wrote:
...snip...

Given that the default command timeout is 5 seconds, it seems strange to
hit a 120 second hung task warning in this instance.  I can only think
that maybe something goofy is going on with xhci_handle_command_timeout
and an unfortunately timed host controller removal.

Idea is that when xhci is removed xhci_mem_cleanup() will take care of
the pending completions.

xhci_mem_cleanup()
   xhci_cleanup_command_queue()
      list_for_each_entry_safe(cur_cmd, tmp_cmd, &xhci->cmd_list, cmd_list)
         xhci_complete_del_and_free_cmd(cur_cmd, COMP_CMD_ABORT);
            if (cmd->completion) {
         complete(cmd->completion);

xhci_stop will want the xhci->mutex when it calls xhci_mem_cleanup.
What happens if someone like xhci_alloc_dev has that mutex while it's
waiting for TRB_ENABLE_SLOT to complete?

Ah, now I got your concern.
I focused too much on the crash instead of the explanation.

Yes, its possible that case could happen. If xhci_alloc_dev() is
waiting for completion with mutex held while host is hotplug removed,
then command will timeout.

xhci_handle_command_timeout() might think command ring is not running
and just try to turn the command to no-op, and restart the command ring.
Host is removed so command ring will never start, and completion is never called.

There is another related issue when aborting the command ring never
genertes a cmd completion event, and we never call completion in that case either,
but thats was related to host dying completely.

I'll start writing a patch for both these cases

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html