RE: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Mathias Nyman [mailto:mathias.nyman@xxxxxxxxxxxxxxx]
> Sent: Tuesday, March 22, 2016 5:36 PM
> To: Rajesh Bhagat <rajesh.bhagat@xxxxxxx>
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Sriram Dash <sriram.dash@xxxxxxx>
> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI
> commmand timeout
> 
> On 22.03.2016 07:19, Rajesh Bhagat wrote:
> >
> >
> >> -----Original Message-----
> >> From: Mathias Nyman [mailto:mathias.nyman@xxxxxxxxx]
> >> Sent: Monday, March 21, 2016 2:46 PM
> >> To: Rajesh Bhagat <rajesh.bhagat@xxxxxxx>; Mathias Nyman
> >> <mathias.nyman@xxxxxxxxxxxxxxx>; linux-usb@xxxxxxxxxxxxxxx; linux-
> >> kernel@xxxxxxxxxxxxxxx
> >> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; Sriram Dash <sriram.dash@xxxxxxx>
> >> Subject: Re: [PATCH] usb: xhci: Fix incomplete PM resume operation
> >> due to XHCI commmand timeout
> >>
> >> On 21.03.2016 06:18, Rajesh Bhagat wrote:
> >>>
> >>>
> >>>>
> >>>> Hi
> >>>>
> >>>> I think clearing the whole command ring is a bit too much in this case.
> >>>> It may cause issues for all attached devices when one command times out.
> >>>>
> >>>
> >>> Hi Mathias,
> >>>
> >>> I understand your point, But I want to understand how would
> >>> completion handler be called if a command is timed out and
> >>> xhci_abort_cmd_ring is successful. In this case all the code would be waiting on
> completion handler forever.
> >>>
> >>>
> >>> 2. xhci_handle_command_timeout -> xhci_abort_cmd_ring(failure) ->
> >>> xhci_cleanup_command_queue -> xhci_complete_del_and_free_cmd
> >>>
> >>> In our case command is timed out, Hence we hit the case #2 but
> >>> xhci_abort_cmd_ring is success which does not calls complete.
> >>
> >> xhci_abort_cmd_ring() will write CA bit (CMD_RING_ABORT) to CRCR register.
> >> This will generate a command completion event with status "command
> >> aborted" for the pending command.
> >> This event is then followed by a "command ring stopped" command completion
> event.
> >>
> >> See xHCI specs 5.4.5 and 4.6.1.2
> >>
> >> handle_cmd_completion() will check if cmd_comp_code ==
> >> COMP_CMD_ABORT, goto event_handled, and call
> >> xhci_complete_del_and_free_cmd(cmd, cmd_comp_code) for the aborted
> command.
> >>
> >> If xHCI already processed the aborted command, we might only get a
> >> command ring stopped event, in this case handle_cmd_completion() will
> >> call xhci_handle_stopped_cmd_ring(xhci, cmd), which will turn the
> >> commands that were tagged for "abort" that still remain on the command ring to
> NO-OP commands.
> >>
> >> The completion callback will be called for these NO-OP command later
> >> when we get a command completion event for them.
> >>
> >
> > Thanks Mathias for detailed explanation. Now I understand how
> > completion handler is supposed to be called in this scenario.
> >
> > But in our case, somehow we are not getting any event and
> > handle_cmd_completion function is not getting called even after successful
> xhci_abort_cmd_ring when command timed out.
> >
> > Now, my point here is code prior to this patch xhci: rework command
> > timeout and cancellation, Code would have returned in case command timed out in
> xhci_alloc_dev itself.
> >
> > -       /* XXX: how much time for xHC slot assignment? */
> > -       timeleft = wait_for_completion_interruptible_timeout(
> > -                       command->completion,
> > -                       XHCI_CMD_DEFAULT_TIMEOUT);
> > -       if (timeleft <= 0) {
> > -               xhci_warn(xhci, "%s while waiting for a slot\n",
> > -                               timeleft == 0 ? "Timeout" : "Signal");
> > -               /* cancel the enable slot request */
> > -               ret = xhci_cancel_cmd(xhci, NULL, command->command_trb);
> > -               return ret;
> > -       }
> > +       wait_for_completion(command->completion);
> >
> > But after this patch, we are waiting for hardware event, which is
> > somehow not generated and causing a hang scenario.
> >
> > IMO, The assumption that "xhci_abort_cmd_ring would always generate an
> > event and handle_cmd_completion would be called" will not be always be true if HW
> is in bad state.
> >
> > Please share your opinion.
> >
> 
> writing the CA (command abort) bit in CRCR (command ring control register)  will stop
> the command ring, and CRR (command ring running) will be set to 0 by xHC.
> xhci_abort_cmd_ring() polls this bit up to 5 seconds.
> If it's not 0 then the driver considers the command abort as failed.
> 
> The scenario you're thinking of is that xHC would still react to CA bit set, it would stop
> the command ring and set CRR 0, but not send a command completion event.
> 
> Have you tried adding some debug to handle_cmd_completion() and see if you receive
> any event after command abortion?
> 

Yes. We have added debug prints at first line of handle_cmd_completion, and we are not getting 
those prints. The last print messages that we get are as below from xhci_alloc_dev while resume 
operation: 

xhci-hcd xhci-hcd.0.auto: Command timeout
xhci-hcd xhci-hcd.0.auto: Abort command ring

May be somehow, USB controller is in bad state and not responding to the commands. 

Please suggest how XHCI driver can handle such situations. 

> The patches I mentioned earlier are also worth trying out, they might fix the cause
> why command aborts in the first place.
> 

We tried both the patches you mentioned. But they dint come to rescue here. 

> -Mathias
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux