> -----Original Message----- > From: Mathias Nyman [mailto:mathias.nyman@xxxxxxxxx] > Sent: Monday, March 21, 2016 2:46 PM > To: Rajesh Bhagat <rajesh.bhagat@xxxxxxx>; Mathias Nyman > <mathias.nyman@xxxxxxxxxxxxxxx>; linux-usb@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx > Cc: gregkh@xxxxxxxxxxxxxxxxxxx; Sriram Dash <sriram.dash@xxxxxxx> > Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI > commmand timeout > > On 21.03.2016 06:18, Rajesh Bhagat wrote: > > > > > >> > >> Hi > >> > >> I think clearing the whole command ring is a bit too much in this case. > >> It may cause issues for all attached devices when one command times out. > >> > > > > Hi Mathias, > > > > I understand your point, But I want to understand how would completion > > handler be called if a command is timed out and xhci_abort_cmd_ring is > > successful. In this case all the code would be waiting on completion handler forever. > > > > > > 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) -> > > xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd > > > > In our case command is timed out, Hence we hit the case #2 but > > xhci_abort_cmd_ring is success which does not calls complete. > > xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register. > This will generate a command completion event with status "command aborted" for > the pending command. > This event is then followed by a "command ring stopped" command completion event. > > See xHCI specs 5.4.5 and 4.6.1.2 > > handle_cmd_completion() will check if cmd_comp_code == COMP_CMD_ABORT, goto > event_handled, and call xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for > the aborted command. > > If xHCI already processed the aborted command, we might only get a command ring > stopped event, in this case handle_cmd_completion() will call > xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the commands that were > tagged for "abort" that still remain on the command ring to NO-OP commands. > > The completion callback will be called for these NO-OP command later when we get a > command completion event for them. > Thanks Mathias for detailed explanation. Now I understand how completion handler is supposed to be called in this scenario. But in our case, somehow we are not getting any event and handle_cmd_completion function is not getting called even after successful xhci_abort_cmd_ring when command timed out. Now, my point here is code prior to this patch xhci: rework command timeout and cancellation, Code would have returned in case command timed out in xhci_alloc_dev itself. - /* XXX: how much time for xHC slot assignment? */ - timeleft = wait_for_completion_interruptible_timeout( - command->completion, - XHCI_CMD_DEFAULT_TIMEOUT); - if (timeleft <= 0) { - xhci_warn(xhci, "%s while waiting for a slot\n", - timeleft == 0 ? "Timeout" : "Signal"); - /* cancel the enable slot request */ - ret = xhci_cancel_cmd(xhci, NULL, command->command_trb); - return ret; - } + wait_for_completion(command->completion); But after this patch, we are waiting for hardware event, which is somehow not generated and causing a hang scenario. IMO, The assumption that "xhci_abort_cmd_ring would always generate an event and handle_cmd_completion would be called" will not be always be true if HW is in bad state. Please share your opinion. > >> What kernel version, and what xhci vendor was this triggered on? > >> > > > > We are using 4.1.8 kernel > > > > Are you able to try a more recent version? > Using a newer kernel version would be bit difficult, but I would surely try it. > -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html