Re: [RFT] xHCI: cancel command queued in the command ring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/27/2012 04:53 PM, Elric Fu wrote:
> 2012/3/27 Andiry Xu <andiry.xu@xxxxxxx>:
>> On 03/26/2012 08:08 PM, Elric Fu wrote:
>>> Software have to abort command ring and cancel command
>>> when a command is failed or hang. Otherwise, the command
>>> ring will hang and can't handle another commands. An
>>> example of a command that may hang is the Address Device
>>> Command, because waiting for a SET_ADDRESS request to be
>>> acknowledged by a USB device is outside of the xHC's
>>> ability to control.
>>>
>>> According to xHCI spec section 4.6.1.1 and section 4.6.1.2,
>>> after aborting a command on the command ring, xHC will
>>> generate a command completion event with its completion code
>>> set to Command Ring Stopped at least. If a command is
>>> currently executing at the time of aborting a command, xHC
>>> also generate a command completion event with its completion
>>> code set to Command Abort. When the command ring is stopped,
>>> software may remove, add, or rearrage Command Descriptors.
>>>
>>> The RFT patch use to cancel command when the above situation
>>> occur.
>>>
>>> To cancel a command, software will initialize a command
>>> descriptor for the cancel command, and add it into a
>>> cancel_cd_list of xhci. When the command ring is stopped,
>>> software will find the command trbs described by command
>>> descriptors in cancel_cd_list and modify it to No Op
>>> command. If software can't find the matched trbs, we can
>>> think it had been finished.
>>>
>>> Signed-off-by: Elric Fu <elricfu1@xxxxxxxxx>
>>> ---
>>>  drivers/usb/host/xhci-mem.c  |    1 +
>>>  drivers/usb/host/xhci-ring.c |  238 ++++++++++++++++++++++++++++++++++++++++++
>>>  drivers/usb/host/xhci.c      |   12 ++-
>>>  drivers/usb/host/xhci.h      |   18 +++
>>>  4 files changed, 266 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>>> index 383fc85..84e7245 100644
>>> --- a/drivers/usb/host/xhci-mem.c
>>> +++ b/drivers/usb/host/xhci-mem.c
>>> @@ -2238,6 +2238,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
>>>       xhci->cmd_ring = xhci_ring_alloc(xhci, 1, true, false, flags);
>>>       if (!xhci->cmd_ring)
>>>               goto fail;
>>> +     INIT_LIST_HEAD(&xhci->cancel_cd_list);
>>
>> What does cd mean? command descriptor? I think cancel_cmd_list is better.
> 
> Yes. Well, the "cancel_cmd_list" is easier to understand.
> 
>>
>>>       xhci_dbg(xhci, "Allocated command ring at %p\n", xhci->cmd_ring);
>>>       xhci_dbg(xhci, "First segment DMA is 0x%llx\n",
>>>                       (unsigned long long)xhci->cmd_ring->first_seg->dma);
>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>> index b62037b..d4d7f8d 100644
>>> --- a/drivers/usb/host/xhci-ring.c
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -292,12 +292,119 @@ static int room_on_ring(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>  /* Ring the host controller doorbell after placing a command on the ring */
>>>  void xhci_ring_cmd_db(struct xhci_hcd *xhci)
>>>  {
>>> +     if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING))
>>> +             return;
>>> +
>>>       xhci_dbg(xhci, "// Ding dong!\n");
>>>       xhci_writel(xhci, DB_VALUE_HOST, &xhci->dba->doorbell[0]);
>>>       /* Flush PCI posted writes */
>>>       xhci_readl(xhci, &xhci->dba->doorbell[0]);
>>>  }
>>>
>>> +int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>>> +{
>>> +     u64 temp_64;
>>> +     int ret;
>>> +
>>> +     xhci_dbg(xhci, "Abort command ring\n");
>>> +
>>> +     if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) {
>>> +             xhci_dbg(xhci, "The command ring isn't running, "
>>> +                             "Have the command ring been stopped?");
>>> +             return -EINVAL;
>>> +     }
>>> +
>>> +     temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>>> +     if (!(temp_64 & CMD_RING_RUNNING)) {
>>> +             xhci_dbg(xhci, "Command ring had been stopped\n");
>>> +             return -EINVAL;
>>> +     }
>>> +     xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
>>> +     xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>>> +                     &xhci->op_regs->cmd_ring);
>>> +
>>> +     /* Section 4.6.1.2 of xHCI 1.0 spec says software should
>>> +      * time the completion od all xHCI commands, including
>>> +      * the Command Abort operation. If software doesn't see
>>> +      * CRR negated in a timely manner (e.g. longer than 5
>>> +      * seconds), then it should assume that the there are
>>> +      * larger problems with the xHC and assert HCRST.
>>> +      */
>>> +     ret = handshake(xhci, &xhci->op_regs->cmd_ring,
>>> +                     CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
>>> +     if (ret < 0) {
>>> +             xhci_err(xhci, "Stopped the command ring failed, "
>>> +                             "maybe the host is dead\n");
>>> +             xhci->xhc_state |= XHCI_STATE_DYING;
>>> +             xhci_quiesce(xhci);
>>> +             xhci_halt(xhci);
>>> +             return -ESHUTDOWN;
>>> +     }
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int queue_cd(struct xhci_hcd *xhci, int slot_id,
>>> +             u32 cmd_type, int ep_index)
>>> +{
>>> +     struct xhci_cd *cd;
>>> +
>>> +     while (1) {
>>> +             if (!(xhci->cmd_ring_state & CMD_RING_STATE_MODIFYING))
>>> +                     break;
>>> +     }
>>
>> Who will modify the cmd_ring state and break the loop? You are holding
>> the lock now.
> 
> It seems like a issue. I don't want to insert the cd into the list when
> the list is being modified. You remind me that maybe the loop
> can't keep the cancel_cd_list safe. The handle_cmd_completion()
> doesn't hold any lock. It is possible that the command ring state
> isn't CMD_RING_STATE_MODIFYING so that cd can be inserted.
> But when cd is being inserted into list command ring state turn into
> CMD_RING_STATE_MODIFYING and cancel_cd() is called. Maybe
> we need a lock for protecting the list. What do you think?
> 

Well, since you are manipulating the list, you need a lock to protect it.
And handle_cmd_completion() is called by xhci_irq(), which is holding
the lock.

>>
>>> +
>>> +     cd = kzalloc(sizeof(struct xhci_cd), GFP_ATOMIC);
>>> +     if (!cd)
>>> +             return -ENOMEM;
>>> +     INIT_LIST_HEAD(&cd->cancel_cd_list);
>>> +
>>> +     cd->slot_id = slot_id;
>>> +     cd->cmd_type = cmd_type;
>>> +     cd->ep_index = ep_index;
>>> +     list_add_tail(&cd->cancel_cd_list, &xhci->cancel_cd_list);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +int xhci_cancel_cmd(struct xhci_hcd *xhci, int slot_id,
>>> +             u32 cmd_type, int ep_index)
>>> +{
>>> +     int retval;
>>> +     unsigned long flags;
>>> +
>>> +     spin_lock_irqsave(&xhci->lock, flags);
>>> +
>>> +     if (xhci->xhc_state & XHCI_STATE_DYING) {
>>> +             xhci_warn(xhci, "Abort the command ring,"
>>> +                             " but the xHCI is dead.\n");
>>> +             spin_unlock_irqrestore(&xhci->lock, flags);
>>> +             return -ESHUTDOWN;
>>> +     }
>>> +
>>> +     /* queue the cmd desriptor to cancelled_cd_list */
>>> +     retval = queue_cd(xhci, slot_id, cmd_type, ep_index);
>>> +     if (retval) {
>>> +             spin_unlock_irqrestore(&xhci->lock, flags);
>>> +             return retval;
>>> +     }
>>> +
>>> +     /* abort command ring */
>>> +     retval = xhci_abort_cmd_ring(xhci);
>>> +     if (retval) {
>>> +             xhci_err(xhci, "Abort command ring failed\n");
>>
>> What happen if queue_cd() succeeds but xhci_abort_cmd_ring() fails?
>> Do you want to remove the cd from the cancel_cd_list? Will there be any
>> mismatch?
> 
> I don't want to remove cd from cancel_cd_list. Because I think we
> can insert cd into the list when the command ring is stopped even
> if the xhci_abort_cmd_ring() is failed. But you remind me that the
> xhci_abort_cmd_ring is failed due to the CRR bit is always 1. In
> this case I have to free the cd, or else the memory leak will occur.
> 
> What do you think about freeing all of command descriptors in
> cancel_cd_list when the xhci_mem_cleanup() is called?
> 

Yes, when xhci_mem_cleanup() is called, you should free all the cds.

>>
>>> +             spin_unlock_irqrestore(&xhci->lock, flags);
>>> +             if (retval == -ESHUTDOWN) {
>>> +                     usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
>>> +                     xhci_dbg(xhci, "xHCI host controller is dead.\n");
>>> +             }
>>> +             return retval;
>>> +     }
>>> +
>>> +     spin_unlock_irqrestore(&xhci->lock, flags);
>>> +     return 0;
>>> +}
>>> +
>>>  void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
>>>               unsigned int slot_id,
>>>               unsigned int ep_index,
>>> @@ -1050,6 +1157,130 @@ static int handle_cmd_in_cmd_wait_list(struct xhci_hcd *xhci,
>>>       return 1;
>>>  }
>>>
>>> +/*
>>> + * Find the command trb need to be cancelled and modify it to NO OP
>>> + * command.
>>> + *
>>> + * If we can't find the command trb, we think it had already been
>>> + * executed.
>>> + */
>>> +static void cd_to_noop(struct xhci_hcd *xhci, struct xhci_cd *cur_cd)
>>> +{
>>> +     struct xhci_segment *cur_seg;
>>> +     union xhci_trb *cmd_trb;
>>> +     int slot_id;
>>> +     u32 cmd_type;
>>> +     u32 cycle_state;
>>> +     int ep_index;
>>> +
>>> +     if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue)
>>> +             return;
>>> +
>>> +     /* find the current segmene of command ring */
>>> +     cur_seg = find_trb_seg(xhci->cmd_ring->first_seg,
>>> +                     xhci->cmd_ring->dequeue, &cycle_state);
>>> +
>>> +     /* find the command trb matched by CD from command ring */
>>> +     for (cmd_trb = xhci->cmd_ring->dequeue;
>>> +                     cmd_trb != xhci->cmd_ring->enqueue;
>>> +                     next_trb(xhci, xhci->cmd_ring, &cur_seg, &cmd_trb)) {
>>> +             if (TRB_TYPE_LINK_LE32(cmd_trb->generic.field[3]))
>>> +                     continue;
>>> +
>>> +             slot_id = TRB_TO_SLOT_ID(
>>> +                             le32_to_cpu(cmd_trb->generic.field[3]));
>>> +             cmd_type = TRB_FIELD_TO_TYPE(
>>> +                             le32_to_cpu(cmd_trb->generic.field[3]));
>>> +             if ((slot_id == cur_cd->slot_id) &&
>>> +                             (cmd_type == cur_cd->cmd_type)) {
>>> +
>>> +                     /* The ep_index in the command desriptors of Stop
>>> +                      * Endpoint command, Set TR Dequeue Pointer command
>>> +                      * and Reset Endpoint command is vaild. so it should
>>> +                      * be match the ep_index of cd.
>>> +                      */
>>> +                     if ((cmd_type == TRB_STOP_RING) ||
>>> +                                     (cmd_type == TRB_SET_DEQ) ||
>>> +                                     (cmd_type == TRB_RESET_EP)) {
>>> +                             ep_index = TRB_TO_EP_INDEX(le32_to_cpu(
>>> +                                             cmd_trb->generic.field[3]));
>>> +                             if (ep_index != cur_cd->ep_index)
>>> +                                     continue;
>>> +                     }
>>> +
>>> +                     /* get cycle state from the origin command trb */
>>> +                     cycle_state = le32_to_cpu(cmd_trb->generic.field[3])
>>> +                             & TRB_CYCLE;
>>> +
>>> +                     /* modify the command trb to NO OP command */
>>> +                     cmd_trb->generic.field[0] = 0;
>>> +                     cmd_trb->generic.field[1] = 0;
>>> +                     cmd_trb->generic.field[2] = 0;
>>> +                     cmd_trb->generic.field[3] = cpu_to_le32(
>>> +                                     TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
>>> +                     break;
>>> +             }
>>> +     }
>>> +}
>>> +
>>> +static void cancel_cd(struct xhci_hcd *xhci)
>>> +{
>>> +     struct xhci_cd *cur_cd;
>>> +     struct list_head *entry;
>>> +     struct list_head *next;
>>> +
>>> +     if (list_empty(&xhci->cancel_cd_list))
>>> +             return;
>>> +
>>> +     list_for_each_safe(entry, next, &xhci->cancel_cd_list) {
>>> +             cur_cd = list_entry(entry, struct xhci_cd, cancel_cd_list);
>>
>> list_for_each_entry_safe()
> 
> You're right.
> 
>>
>>> +             xhci_dbg(xhci, "Cancel command, slot %d, cmd type %d, "
>>> +                             "ep_index = %d\n", cur_cd->slot_id,
>>> +                             cur_cd->cmd_type, cur_cd->ep_index);
>>> +             cd_to_noop(xhci, cur_cd);
>>> +
>>> +             /* Whatever we find the matched command trb, we need to
>>> +              * release the command descriptor.
>>> +              */
>>> +             list_del(&cur_cd->cancel_cd_list);
>>> +             kfree(cur_cd);
>>> +     }
>>> +}
>>> +
>>> +static void handle_stopped_cmd_ring(struct xhci_hcd *xhci,
>>> +             struct xhci_event_cmd *event)
>>> +{
>>> +     struct xhci_virt_device *virt_dev;
>>> +     int slot_id = TRB_TO_SLOT_ID(
>>> +                     le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3]));
>>> +     int cmd_trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>>> +
>>> +     virt_dev = xhci->devs[slot_id];
>>> +     if (virt_dev) {
>>> +             /* if the command is in cmd_list, handle it. */
>>> +             handle_cmd_in_cmd_wait_list(xhci, virt_dev, event);
>>> +     }
>>> +
>>> +     /* advance the dequeue pointer to next cmd trb */
>>> +     inc_deq(xhci, xhci->cmd_ring, false);
>>> +
>>> +     /* set the cmd ring state */
>>> +     if (cmd_trb_comp_code != COMP_CMD_STOP) {
>>> +             xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
>>
>> if cmd_trb_comp_code == COMP_CMD_ABORT, you just change the
>> cmd_ring_state and return. Are you waiting for the COMP_CMD_STOP event
>> and process them all in one time?
> 
> Yes. The host doesn't definitely generated the completion event with
> the completion code set to COMP_CMD_ABORT. So I just wait for the
> COMP_CMD_STOP event.
> 

Does the spec mention this? COMP_CMD_ABORT is not always generated but
COMP_CMD_STOP always is? I need to read the spec to dig out more
information.

Thanks,
Andiry

>>
>>> +     } else {
>>> +             xhci->cmd_ring_state = CMD_RING_STATE_MODIFYING;
>>> +             cancel_cd(xhci);
>>> +             xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
>>> +
>>> +             /*
>>> +              * ring command ring doorbell again to handle the waiting
>>> +              * command trbs due to aborting command ring
>>> +              */
>>> +             if (xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue)
>>> +                     xhci_ring_cmd_db(xhci);
>>> +     }
>>> +}
>>> +
>>>  static void handle_cmd_completion(struct xhci_hcd *xhci,
>>>               struct xhci_event_cmd *event)
>>>  {
>>> @@ -1075,6 +1306,13 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
>>>               xhci->error_bitmask |= 1 << 5;
>>>               return;
>>>       }
>>> +
>>> +     if ((GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_ABORT) ||
>>> +             (GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_STOP)) {
>>> +             handle_stopped_cmd_ring(xhci, event);
>>> +             return;
>>> +     }
>>> +
>>>       switch (le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3])
>>>               & TRB_TYPE_BITMASK) {
>>>       case TRB_TYPE(TRB_ENABLE_SLOT):
>>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
>>> index c939f5f..9434771 100644
>>> --- a/drivers/usb/host/xhci.c
>>> +++ b/drivers/usb/host/xhci.c
>>> @@ -51,7 +51,7 @@ MODULE_PARM_DESC(link_quirk, "Don't clear the chain bit on a link TRB");
>>>   * handshake done).  There are two failure modes:  "usec" have passed (major
>>>   * hardware flakeout), or the register reads as all-ones (hardware removed).
>>>   */
>>> -static int handshake(struct xhci_hcd *xhci, void __iomem *ptr,
>>> +int handshake(struct xhci_hcd *xhci, void __iomem *ptr,
>>>                     u32 mask, u32 done, int usec)
>>>  {
>>>       u32     result;
>>> @@ -104,8 +104,10 @@ int xhci_halt(struct xhci_hcd *xhci)
>>>
>>>       ret = handshake(xhci, &xhci->op_regs->status,
>>>                       STS_HALT, STS_HALT, XHCI_MAX_HALT_USEC);
>>> -     if (!ret)
>>> +     if (!ret) {
>>>               xhci->xhc_state |= XHCI_STATE_HALTED;
>>> +             xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
>>> +     }
>>>       return ret;
>>>  }
>>>
>>> @@ -473,6 +475,7 @@ static int xhci_run_finished(struct xhci_hcd *xhci)
>>>               return -ENODEV;
>>>       }
>>>       xhci->shared_hcd->state = HC_STATE_RUNNING;
>>> +     xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
>>>
>>>       if (xhci->quirks & XHCI_NEC_HOST)
>>>               xhci_ring_cmd_db(xhci);
>>> @@ -3542,7 +3545,10 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev)
>>>       if (timeleft <= 0) {
>>>               xhci_warn(xhci, "%s while waiting for address device command\n",
>>>                               timeleft == 0 ? "Timeout" : "Signal");
>>> -             /* FIXME cancel the address device command */
>>> +             /* cancel the address device command */
>>> +             ret = xhci_cancel_cmd(xhci, udev->slot_id, TRB_ADDR_DEV, 0);
>>> +             if (ret < 0)
>>> +                     return ret;
>>>               return -ETIME;
>>>       }
>>>
>>> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
>>> index 5b29d3c..8e59c42 100644
>>> --- a/drivers/usb/host/xhci.h
>>> +++ b/drivers/usb/host/xhci.h
>>> @@ -1247,6 +1247,14 @@ struct xhci_td {
>>>       union xhci_trb          *last_trb;
>>>  };
>>>
>>> +/* command descriptor */
>>> +struct xhci_cd {
>>> +     struct list_head        cancel_cd_list;
>>> +     int                     slot_id;
>>> +     u32                     cmd_type;
>>> +     int                     ep_index;
>>> +};
>>> +
>>>  struct xhci_dequeue_state {
>>>       struct xhci_segment *new_deq_seg;
>>>       union xhci_trb *new_deq_ptr;
>>> @@ -1394,6 +1402,12 @@ struct xhci_hcd {
>>>       /* data structures */
>>>       struct xhci_device_context_array *dcbaa;
>>>       struct xhci_ring        *cmd_ring;
>>> +     unsigned int            cmd_ring_state;
>>> +#define CMD_RING_STATE_RUNNING               (1 << 0)
>>> +#define CMD_RING_STATE_ABORTED               (1 << 1)
>>> +#define CMD_RING_STATE_MODIFYING     (1 << 2)
>>> +#define CMD_RING_STATE_STOPPED               (1 << 3)
>>> +     struct list_head        cancel_cd_list;
>>>       unsigned int            cmd_ring_reserved_trbs;
>>>       struct xhci_ring        *event_ring;
>>>       struct xhci_erst        erst;
>>> @@ -1651,6 +1665,8 @@ static inline void xhci_unregister_pci(void) {}
>>>
>>>  /* xHCI host controller glue */
>>>  typedef void (*xhci_get_quirks_t)(struct device *, struct xhci_hcd *);
>>> +int handshake(struct xhci_hcd *xhci, void __iomem *ptr,
>>> +                   u32 mask, u32 done, int usec);
>>>  void xhci_quiesce(struct xhci_hcd *xhci);
>>>  int xhci_halt(struct xhci_hcd *xhci);
>>>  int xhci_reset(struct xhci_hcd *xhci);
>>> @@ -1741,6 +1757,8 @@ void xhci_queue_config_ep_quirk(struct xhci_hcd *xhci,
>>>               unsigned int slot_id, unsigned int ep_index,
>>>               struct xhci_dequeue_state *deq_state);
>>>  void xhci_stop_endpoint_command_watchdog(unsigned long arg);
>>> +int xhci_cancel_cmd(struct xhci_hcd *xhci, int slot_id,
>>> +             u32 cmd_type, int ep_index);
>>>  void xhci_ring_ep_doorbell(struct xhci_hcd *xhci, unsigned int slot_id,
>>>               unsigned int ep_index, unsigned int stream_id);
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux