2012/3/28 Andiry Xu <andiry.xu@xxxxxxx>: > On 03/27/2012 04:53 PM, Elric Fu wrote: >> 2012/3/27 Andiry Xu <andiry.xu@xxxxxxx>: >>> On 03/26/2012 08:08 PM, Elric Fu wrote: >>>> Software have to abort command ring and cancel command >>>> when a command is failed or hang. Otherwise, the command >>>> ring will hang and can't handle another commands. An >>>> example of a command that may hang is the Address Device >>>> Command, because waiting for a SET_ADDRESS request to be >>>> acknowledged by a USB device is outside of the xHC's >>>> ability to control. >>>> >>>> According to xHCI spec section 4.6.1.1 and section 4.6.1.2, >>>> after aborting a command on the command ring, xHC will >>>> generate a command completion event with its completion code >>>> set to Command Ring Stopped at least. If a command is >>>> currently executing at the time of aborting a command, xHC >>>> also generate a command completion event with its completion >>>> code set to Command Abort. When the command ring is stopped, >>>> software may remove, add, or rearrage Command Descriptors. >>>> >>>> The RFT patch use to cancel command when the above situation >>>> occur. >>>> >>>> To cancel a command, software will initialize a command >>>> descriptor for the cancel command, and add it into a >>>> cancel_cd_list of xhci. When the command ring is stopped, >>>> software will find the command trbs described by command >>>> descriptors in cancel_cd_list and modify it to No Op >>>> command. If software can't find the matched trbs, we can >>>> think it had been finished. >>>> >>>> Signed-off-by: Elric Fu <elricfu1@xxxxxxxxx> >>>> --- >>>> drivers/usb/host/xhci-mem.c | 1 + >>>> drivers/usb/host/xhci-ring.c | 238 ++++++++++++++++++++++++++++++++++++++++++ >>>> drivers/usb/host/xhci.c | 12 ++- >>>> drivers/usb/host/xhci.h | 18 +++ >>>> 4 files changed, 266 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c >>>> index 383fc85..84e7245 100644 >>>> --- a/drivers/usb/host/xhci-mem.c >>>> +++ b/drivers/usb/host/xhci-mem.c >>>> @@ -2238,6 +2238,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags) >>>> xhci->cmd_ring = xhci_ring_alloc(xhci, 1, true, false, flags); >>>> if (!xhci->cmd_ring) >>>> goto fail; >>>> + INIT_LIST_HEAD(&xhci->cancel_cd_list); >>> >>> What does cd mean? command descriptor? I think cancel_cmd_list is better. >> >> Yes. Well, the "cancel_cmd_list" is easier to understand. >> >>> >>>> xhci_dbg(xhci, "Allocated command ring at %p\n", xhci->cmd_ring); >>>> xhci_dbg(xhci, "First segment DMA is 0x%llx\n", >>>> (unsigned long long)xhci->cmd_ring->first_seg->dma); >>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >>>> index b62037b..d4d7f8d 100644 >>>> --- a/drivers/usb/host/xhci-ring.c >>>> +++ b/drivers/usb/host/xhci-ring.c >>>> @@ -292,12 +292,119 @@ static int room_on_ring(struct xhci_hcd *xhci, struct xhci_ring *ring, >>>> /* Ring the host controller doorbell after placing a command on the ring */ >>>> void xhci_ring_cmd_db(struct xhci_hcd *xhci) >>>> { >>>> + if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) >>>> + return; >>>> + >>>> xhci_dbg(xhci, "// Ding dong!\n"); >>>> xhci_writel(xhci, DB_VALUE_HOST, &xhci->dba->doorbell[0]); >>>> /* Flush PCI posted writes */ >>>> xhci_readl(xhci, &xhci->dba->doorbell[0]); >>>> } >>>> >>>> +int xhci_abort_cmd_ring(struct xhci_hcd *xhci) >>>> +{ >>>> + u64 temp_64; >>>> + int ret; >>>> + >>>> + xhci_dbg(xhci, "Abort command ring\n"); >>>> + >>>> + if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) { >>>> + xhci_dbg(xhci, "The command ring isn't running, " >>>> + "Have the command ring been stopped?"); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring); >>>> + if (!(temp_64 & CMD_RING_RUNNING)) { >>>> + xhci_dbg(xhci, "Command ring had been stopped\n"); >>>> + return -EINVAL; >>>> + } >>>> + xhci->cmd_ring_state = CMD_RING_STATE_ABORTED; >>>> + xhci_write_64(xhci, temp_64 | CMD_RING_ABORT, >>>> + &xhci->op_regs->cmd_ring); >>>> + >>>> + /* Section 4.6.1.2 of xHCI 1.0 spec says software should >>>> + * time the completion od all xHCI commands, including >>>> + * the Command Abort operation. If software doesn't see >>>> + * CRR negated in a timely manner (e.g. longer than 5 >>>> + * seconds), then it should assume that the there are >>>> + * larger problems with the xHC and assert HCRST. >>>> + */ >>>> + ret = handshake(xhci, &xhci->op_regs->cmd_ring, >>>> + CMD_RING_RUNNING, 0, 5 * 1000 * 1000); >>>> + if (ret < 0) { >>>> + xhci_err(xhci, "Stopped the command ring failed, " >>>> + "maybe the host is dead\n"); >>>> + xhci->xhc_state |= XHCI_STATE_DYING; >>>> + xhci_quiesce(xhci); >>>> + xhci_halt(xhci); >>>> + return -ESHUTDOWN; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int queue_cd(struct xhci_hcd *xhci, int slot_id, >>>> + u32 cmd_type, int ep_index) >>>> +{ >>>> + struct xhci_cd *cd; >>>> + >>>> + while (1) { >>>> + if (!(xhci->cmd_ring_state & CMD_RING_STATE_MODIFYING)) >>>> + break; >>>> + } >>> >>> Who will modify the cmd_ring state and break the loop? You are holding >>> the lock now. >> >> It seems like a issue. I don't want to insert the cd into the list when >> the list is being modified. You remind me that maybe the loop >> can't keep the cancel_cd_list safe. The handle_cmd_completion() >> doesn't hold any lock. It is possible that the command ring state >> isn't CMD_RING_STATE_MODIFYING so that cd can be inserted. >> But when cd is being inserted into list command ring state turn into >> CMD_RING_STATE_MODIFYING and cancel_cd() is called. Maybe >> we need a lock for protecting the list. What do you think? >> > > Well, since you are manipulating the list, you need a lock to protect it. > And handle_cmd_completion() is called by xhci_irq(), which is holding > the lock. Sorry. I made a mistake. The handle_cmd_completion() is holding the xhci->lock. Well, this part of codes is redundant. > >>> >>>> + >>>> + cd = kzalloc(sizeof(struct xhci_cd), GFP_ATOMIC); >>>> + if (!cd) >>>> + return -ENOMEM; >>>> + INIT_LIST_HEAD(&cd->cancel_cd_list); >>>> + >>>> + cd->slot_id = slot_id; >>>> + cd->cmd_type = cmd_type; >>>> + cd->ep_index = ep_index; >>>> + list_add_tail(&cd->cancel_cd_list, &xhci->cancel_cd_list); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +int xhci_cancel_cmd(struct xhci_hcd *xhci, int slot_id, >>>> + u32 cmd_type, int ep_index) >>>> +{ >>>> + int retval; >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&xhci->lock, flags); >>>> + >>>> + if (xhci->xhc_state & XHCI_STATE_DYING) { >>>> + xhci_warn(xhci, "Abort the command ring," >>>> + " but the xHCI is dead.\n"); >>>> + spin_unlock_irqrestore(&xhci->lock, flags); >>>> + return -ESHUTDOWN; >>>> + } >>>> + >>>> + /* queue the cmd desriptor to cancelled_cd_list */ >>>> + retval = queue_cd(xhci, slot_id, cmd_type, ep_index); >>>> + if (retval) { >>>> + spin_unlock_irqrestore(&xhci->lock, flags); >>>> + return retval; >>>> + } >>>> + >>>> + /* abort command ring */ >>>> + retval = xhci_abort_cmd_ring(xhci); >>>> + if (retval) { >>>> + xhci_err(xhci, "Abort command ring failed\n"); >>> >>> What happen if queue_cd() succeeds but xhci_abort_cmd_ring() fails? >>> Do you want to remove the cd from the cancel_cd_list? Will there be any >>> mismatch? >> >> I don't want to remove cd from cancel_cd_list. Because I think we >> can insert cd into the list when the command ring is stopped even >> if the xhci_abort_cmd_ring() is failed. But you remind me that the >> xhci_abort_cmd_ring is failed due to the CRR bit is always 1. In >> this case I have to free the cd, or else the memory leak will occur. >> >> What do you think about freeing all of command descriptors in >> cancel_cd_list when the xhci_mem_cleanup() is called? >> > > Yes, when xhci_mem_cleanup() is called, you should free all the cds. > >>> >>>> + spin_unlock_irqrestore(&xhci->lock, flags); >>>> + if (retval == -ESHUTDOWN) { >>>> + usb_hc_died(xhci_to_hcd(xhci)->primary_hcd); >>>> + xhci_dbg(xhci, "xHCI host controller is dead.\n"); >>>> + } >>>> + return retval; >>>> + } >>>> + >>>> + spin_unlock_irqrestore(&xhci->lock, flags); >>>> + return 0; >>>> +} >>>> + >>>> void xhci_ring_ep_doorbell(struct xhci_hcd *xhci, >>>> unsigned int slot_id, >>>> unsigned int ep_index, >>>> @@ -1050,6 +1157,130 @@ static int handle_cmd_in_cmd_wait_list(struct xhci_hcd *xhci, >>>> return 1; >>>> } >>>> >>>> +/* >>>> + * Find the command trb need to be cancelled and modify it to NO OP >>>> + * command. >>>> + * >>>> + * If we can't find the command trb, we think it had already been >>>> + * executed. >>>> + */ >>>> +static void cd_to_noop(struct xhci_hcd *xhci, struct xhci_cd *cur_cd) >>>> +{ >>>> + struct xhci_segment *cur_seg; >>>> + union xhci_trb *cmd_trb; >>>> + int slot_id; >>>> + u32 cmd_type; >>>> + u32 cycle_state; >>>> + int ep_index; >>>> + >>>> + if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue) >>>> + return; >>>> + >>>> + /* find the current segmene of command ring */ >>>> + cur_seg = find_trb_seg(xhci->cmd_ring->first_seg, >>>> + xhci->cmd_ring->dequeue, &cycle_state); >>>> + >>>> + /* find the command trb matched by CD from command ring */ >>>> + for (cmd_trb = xhci->cmd_ring->dequeue; >>>> + cmd_trb != xhci->cmd_ring->enqueue; >>>> + next_trb(xhci, xhci->cmd_ring, &cur_seg, &cmd_trb)) { >>>> + if (TRB_TYPE_LINK_LE32(cmd_trb->generic.field[3])) >>>> + continue; >>>> + >>>> + slot_id = TRB_TO_SLOT_ID( >>>> + le32_to_cpu(cmd_trb->generic.field[3])); >>>> + cmd_type = TRB_FIELD_TO_TYPE( >>>> + le32_to_cpu(cmd_trb->generic.field[3])); >>>> + if ((slot_id == cur_cd->slot_id) && >>>> + (cmd_type == cur_cd->cmd_type)) { >>>> + >>>> + /* The ep_index in the command desriptors of Stop >>>> + * Endpoint command, Set TR Dequeue Pointer command >>>> + * and Reset Endpoint command is vaild. so it should >>>> + * be match the ep_index of cd. >>>> + */ >>>> + if ((cmd_type == TRB_STOP_RING) || >>>> + (cmd_type == TRB_SET_DEQ) || >>>> + (cmd_type == TRB_RESET_EP)) { >>>> + ep_index = TRB_TO_EP_INDEX(le32_to_cpu( >>>> + cmd_trb->generic.field[3])); >>>> + if (ep_index != cur_cd->ep_index) >>>> + continue; >>>> + } >>>> + >>>> + /* get cycle state from the origin command trb */ >>>> + cycle_state = le32_to_cpu(cmd_trb->generic.field[3]) >>>> + & TRB_CYCLE; >>>> + >>>> + /* modify the command trb to NO OP command */ >>>> + cmd_trb->generic.field[0] = 0; >>>> + cmd_trb->generic.field[1] = 0; >>>> + cmd_trb->generic.field[2] = 0; >>>> + cmd_trb->generic.field[3] = cpu_to_le32( >>>> + TRB_TYPE(TRB_CMD_NOOP) | cycle_state); >>>> + break; >>>> + } >>>> + } >>>> +} >>>> + >>>> +static void cancel_cd(struct xhci_hcd *xhci) >>>> +{ >>>> + struct xhci_cd *cur_cd; >>>> + struct list_head *entry; >>>> + struct list_head *next; >>>> + >>>> + if (list_empty(&xhci->cancel_cd_list)) >>>> + return; >>>> + >>>> + list_for_each_safe(entry, next, &xhci->cancel_cd_list) { >>>> + cur_cd = list_entry(entry, struct xhci_cd, cancel_cd_list); >>> >>> list_for_each_entry_safe() >> >> You're right. >> >>> >>>> + xhci_dbg(xhci, "Cancel command, slot %d, cmd type %d, " >>>> + "ep_index = %d\n", cur_cd->slot_id, >>>> + cur_cd->cmd_type, cur_cd->ep_index); >>>> + cd_to_noop(xhci, cur_cd); >>>> + >>>> + /* Whatever we find the matched command trb, we need to >>>> + * release the command descriptor. >>>> + */ >>>> + list_del(&cur_cd->cancel_cd_list); >>>> + kfree(cur_cd); >>>> + } >>>> +} >>>> + >>>> +static void handle_stopped_cmd_ring(struct xhci_hcd *xhci, >>>> + struct xhci_event_cmd *event) >>>> +{ >>>> + struct xhci_virt_device *virt_dev; >>>> + int slot_id = TRB_TO_SLOT_ID( >>>> + le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3])); >>>> + int cmd_trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->status)); >>>> + >>>> + virt_dev = xhci->devs[slot_id]; >>>> + if (virt_dev) { >>>> + /* if the command is in cmd_list, handle it. */ >>>> + handle_cmd_in_cmd_wait_list(xhci, virt_dev, event); >>>> + } >>>> + >>>> + /* advance the dequeue pointer to next cmd trb */ >>>> + inc_deq(xhci, xhci->cmd_ring, false); >>>> + >>>> + /* set the cmd ring state */ >>>> + if (cmd_trb_comp_code != COMP_CMD_STOP) { >>>> + xhci->cmd_ring_state = CMD_RING_STATE_STOPPED; >>> >>> if cmd_trb_comp_code == COMP_CMD_ABORT, you just change the >>> cmd_ring_state and return. Are you waiting for the COMP_CMD_STOP event >>> and process them all in one time? >> >> Yes. The host doesn't definitely generated the completion event with >> the completion code set to COMP_CMD_ABORT. So I just wait for the >> COMP_CMD_STOP event. >> > > Does the spec mention this? COMP_CMD_ABORT is not always generated but > COMP_CMD_STOP always is? I need to read the spec to dig out more > information. Yes, You can read section 4.6.1.1 and 4.6.1.2 in xHCI spec. > > Thanks, > Andiry > >>> >>>> + } else { >>>> + xhci->cmd_ring_state = CMD_RING_STATE_MODIFYING; >>>> + cancel_cd(xhci); >>>> + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; >>>> + >>>> + /* >>>> + * ring command ring doorbell again to handle the waiting >>>> + * command trbs due to aborting command ring >>>> + */ >>>> + if (xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) >>>> + xhci_ring_cmd_db(xhci); >>>> + } >>>> +} >>>> + >>>> static void handle_cmd_completion(struct xhci_hcd *xhci, >>>> struct xhci_event_cmd *event) >>>> { >>>> @@ -1075,6 +1306,13 @@ static void handle_cmd_completion(struct xhci_hcd *xhci, >>>> xhci->error_bitmask |= 1 << 5; >>>> return; >>>> } >>>> + >>>> + if ((GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_ABORT) || >>>> + (GET_COMP_CODE(le32_to_cpu(event->status)) == COMP_CMD_STOP)) { >>>> + handle_stopped_cmd_ring(xhci, event); >>>> + return; >>>> + } >>>> + >>>> switch (le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3]) >>>> & TRB_TYPE_BITMASK) { >>>> case TRB_TYPE(TRB_ENABLE_SLOT): >>>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c >>>> index c939f5f..9434771 100644 >>>> --- a/drivers/usb/host/xhci.c >>>> +++ b/drivers/usb/host/xhci.c >>>> @@ -51,7 +51,7 @@ MODULE_PARM_DESC(link_quirk, "Don't clear the chain bit on a link TRB"); >>>> * handshake done). There are two failure modes: "usec" have passed (major >>>> * hardware flakeout), or the register reads as all-ones (hardware removed). >>>> */ >>>> -static int handshake(struct xhci_hcd *xhci, void __iomem *ptr, >>>> +int handshake(struct xhci_hcd *xhci, void __iomem *ptr, >>>> u32 mask, u32 done, int usec) >>>> { >>>> u32 result; >>>> @@ -104,8 +104,10 @@ int xhci_halt(struct xhci_hcd *xhci) >>>> >>>> ret = handshake(xhci, &xhci->op_regs->status, >>>> STS_HALT, STS_HALT, XHCI_MAX_HALT_USEC); >>>> - if (!ret) >>>> + if (!ret) { >>>> xhci->xhc_state |= XHCI_STATE_HALTED; >>>> + xhci->cmd_ring_state = CMD_RING_STATE_STOPPED; >>>> + } >>>> return ret; >>>> } >>>> >>>> @@ -473,6 +475,7 @@ static int xhci_run_finished(struct xhci_hcd *xhci) >>>> return -ENODEV; >>>> } >>>> xhci->shared_hcd->state = HC_STATE_RUNNING; >>>> + xhci->cmd_ring_state = CMD_RING_STATE_RUNNING; >>>> >>>> if (xhci->quirks & XHCI_NEC_HOST) >>>> xhci_ring_cmd_db(xhci); >>>> @@ -3542,7 +3545,10 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev) >>>> if (timeleft <= 0) { >>>> xhci_warn(xhci, "%s while waiting for address device command\n", >>>> timeleft == 0 ? "Timeout" : "Signal"); >>>> - /* FIXME cancel the address device command */ >>>> + /* cancel the address device command */ >>>> + ret = xhci_cancel_cmd(xhci, udev->slot_id, TRB_ADDR_DEV, 0); >>>> + if (ret < 0) >>>> + return ret; >>>> return -ETIME; >>>> } >>>> >>>> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h >>>> index 5b29d3c..8e59c42 100644 >>>> --- a/drivers/usb/host/xhci.h >>>> +++ b/drivers/usb/host/xhci.h >>>> @@ -1247,6 +1247,14 @@ struct xhci_td { >>>> union xhci_trb *last_trb; >>>> }; >>>> >>>> +/* command descriptor */ >>>> +struct xhci_cd { >>>> + struct list_head cancel_cd_list; >>>> + int slot_id; >>>> + u32 cmd_type; >>>> + int ep_index; >>>> +}; >>>> + >>>> struct xhci_dequeue_state { >>>> struct xhci_segment *new_deq_seg; >>>> union xhci_trb *new_deq_ptr; >>>> @@ -1394,6 +1402,12 @@ struct xhci_hcd { >>>> /* data structures */ >>>> struct xhci_device_context_array *dcbaa; >>>> struct xhci_ring *cmd_ring; >>>> + unsigned int cmd_ring_state; >>>> +#define CMD_RING_STATE_RUNNING (1 << 0) >>>> +#define CMD_RING_STATE_ABORTED (1 << 1) >>>> +#define CMD_RING_STATE_MODIFYING (1 << 2) >>>> +#define CMD_RING_STATE_STOPPED (1 << 3) >>>> + struct list_head cancel_cd_list; >>>> unsigned int cmd_ring_reserved_trbs; >>>> struct xhci_ring *event_ring; >>>> struct xhci_erst erst; >>>> @@ -1651,6 +1665,8 @@ static inline void xhci_unregister_pci(void) {} >>>> >>>> /* xHCI host controller glue */ >>>> typedef void (*xhci_get_quirks_t)(struct device *, struct xhci_hcd *); >>>> +int handshake(struct xhci_hcd *xhci, void __iomem *ptr, >>>> + u32 mask, u32 done, int usec); >>>> void xhci_quiesce(struct xhci_hcd *xhci); >>>> int xhci_halt(struct xhci_hcd *xhci); >>>> int xhci_reset(struct xhci_hcd *xhci); >>>> @@ -1741,6 +1757,8 @@ void xhci_queue_config_ep_quirk(struct xhci_hcd *xhci, >>>> unsigned int slot_id, unsigned int ep_index, >>>> struct xhci_dequeue_state *deq_state); >>>> void xhci_stop_endpoint_command_watchdog(unsigned long arg); >>>> +int xhci_cancel_cmd(struct xhci_hcd *xhci, int slot_id, >>>> + u32 cmd_type, int ep_index); >>>> void xhci_ring_ep_doorbell(struct xhci_hcd *xhci, unsigned int slot_id, >>>> unsigned int ep_index, unsigned int stream_id); >>>> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html