Re: [RFT v2] xHCI: cancel command queued in the command ring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2012/6/12 Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>:
> Hi Elric,
>
> Are you going to have time to rework this patch?  I haven't heard back
> from you for a week.
>
> If I don't hear from you in a couple days, I'll assume you're not
> checking your email.  Alex, would you consider making the changes
> yourself if that happens?

I am so sorry. I was assigned a emergency job by my boss since April, and
just came back on last weekend. I will check and response these emails
carefully.

Best Regards,
Elric

>
> Comments below.
>
> On Fri, Jun 01, 2012 at 02:39:05PM +0300, Alexander Shishkin wrote:
>> Elric Fu <elricfu1@xxxxxxxxx> writes:
>> > Hi Sarah,
>> >
>> > please test it in your old VIA USB3.0 hub. I can't duplicate
>> > the case of the address device command timeout in my VIA
>> > USB3.0 hub. I heard that the GENESYS GL3310 sata bridge would
>> > respond the address device command after a long time when it
>> > was plugged into VIA xHCI host. And the issue will occur after
>> > several quick disconnect/connect cycles. Yesterday I finally
>> > got a GL3310 device. But it is ok and responds the address
>> > device command very quickly. I can't duplicate it.
>> >
>> > In additional, any comment about the patch?
>>
>> Hi,
>>
>> I've tested this patch (it requires some rediffing) with a UDC that I
>> hacked to ignore SET_ADDRESS requests and it does fix the problem. Some
>> comments below.
>
> First, in general, I would suggest you only manipulate the command ring
> state while holding the xhci->lock.  Since we can have multiple
> outstanding commands that need to be aborted, you need to hold the lock
> in places like this:
>
>        temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>        if (!(temp_64 & CMD_RING_RUNNING)) {
>                xhci_dbg(xhci, "Command ring had been stopped\n");
>                return -EINVAL;
>        }
>        xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
>        xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>                        &xhci->op_regs->cmd_ring);
>
> And what happens if you have multiple commands that need to be aborted,
> and the abort handlers race with each other?
>
> You don't have that problem now, because this patch only adds support
> for aborting the Set Address command.  However, I also want to abort
> other commands, like Configure Endpoint, etc.
>
>
> This patch needs to be modified to cover several corner cases:
>
> 1. Each device can have commands outstanding independent of whether
> other devices have commands queued.
>
> 2. Each device can have multiple commands queued for it at the same time.
>
> 3. The same type of command could be queued for the same device.  It may
> not make sense for most of the commands, but I want to future proof this
> code in case that ever becomes true.
>
> 4. Commands could have different timeout values, or be aborted early
> because of a signal from userspace.  Therefore, the command we need to
> abort may not be the command that is pointed to by the command ring
> dequeue pointer.
>
> 5. Aborted commands need to have all memory for those commands freed by
> the stopped command ring handler.  Otherwise we race between the
> function that issued the command ring abort and the handler.  This means
> the stopped command ring handler needs to free the xhci_command for any
> aborted commands, and remove the command from the xhci_virt_device
> command wait list.
>
>
>> [snip]
>> > +int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
>>
>> Shouldn't this be static?
>>
>> > +{
>> > +   u64 temp_64;
>> > +   int ret;
>> > +
>> > +   xhci_dbg(xhci, "Abort command ring\n");
>> > +
>> > +   if (!(xhci->cmd_ring_state & CMD_RING_STATE_RUNNING)) {
>> > +           xhci_dbg(xhci, "The command ring isn't running, "
>> > +                           "Have the command ring been stopped?");
>> > +           return -EINVAL;
>> > +   }
>> > +
>> > +   temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
>> > +   if (!(temp_64 & CMD_RING_RUNNING)) {
>> > +           xhci_dbg(xhci, "Command ring had been stopped\n");
>> > +           return -EINVAL;
>> > +   }
>
> Don't return an error code here.  It's possible that two different
> command aborts will race with each other, and you should be able to
> handle that case.  If you return -EINVAL here, xhci_cancel_cmd() will
> tell the USB core that the host controller died, which is not what you
> want.
>
> Instead, have xhci_cancel_cmd() add the command to the command list
> under the xHCI lock, and read the command ring status while you're
> holding that lock.  Drop the lock, and take a look at the command ring
> status.
>
> If it was stopped, you know handle_stopped_cmd_ring() hadn't run yet
> when you queued the command to the cancel list (possibly because the
> event ring handler hadn't got to the command stopped event TRB yet).
> But you do know it *will* run and handle your command.  So if the
> command ring status was stopped, don't call xhci_abort_cmd_ring(), and
> just return.
>
>> > +   xhci->cmd_ring_state = CMD_RING_STATE_ABORTED;
>> > +   xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
>> > +                   &xhci->op_regs->cmd_ring);
>> > +
>> > +   /* Section 4.6.1.2 of xHCI 1.0 spec says software should
>> > +    * time the completion od all xHCI commands, including
>> > +    * the Command Abort operation. If software doesn't see
>> > +    * CRR negated in a timely manner (e.g. longer than 5
>> > +    * seconds), then it should assume that the there are
>> > +    * larger problems with the xHC and assert HCRST.
>> > +    */
>> > +   ret = handshake(xhci, &xhci->op_regs->cmd_ring,
>> > +                   CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
>> > +   if (ret < 0) {
>> > +           xhci_err(xhci, "Stopped the command ring failed, "
>> > +                           "maybe the host is dead\n");
>> > +           xhci->xhc_state |= XHCI_STATE_DYING;
>> > +           xhci_quiesce(xhci);
>> > +           xhci_halt(xhci);
>> > +           return -ESHUTDOWN;
>> > +   }
>> > +
>> > +   return 0;
>> > +}
>> > +
>> > +static int queue_cd(struct xhci_hcd *xhci, int slot_id,
>> > +           u32 cmd_type, int ep_index)
>> > +{
>> > +   struct xhci_cd *cd;
>> > +
>> > +   cd = kzalloc(sizeof(struct xhci_cd), GFP_ATOMIC);
>> > +   if (!cd)
>> > +           return -ENOMEM;
>> > +   INIT_LIST_HEAD(&cd->cancel_cmd_list);
>> > +
>> > +   cd->slot_id = slot_id;
>> > +   cd->cmd_type = cmd_type;
>> > +   cd->ep_index = ep_index;
>> > +   list_add_tail(&cd->cancel_cmd_list, &xhci->cancel_cmd_list);
>> > +
>> > +   return 0;
>> > +}
>> > +
>> > +int xhci_cancel_cmd(struct xhci_hcd *xhci, int slot_id,
>> > +           u32 cmd_type, int ep_index)
>> > +{
>> > +   int retval;
>> > +   unsigned long flags;
>> > +
>> > +   spin_lock_irqsave(&xhci->lock, flags);
>>
>> It looks like xhci_abort_cmd_ring() can race with xhci_ring_cmd_db(),
>> since there doesn't seem to be a requirement for the latter to be called
>> under &xhci->lock. I found one case of this in xhci_run_finished() for
>> controllers with NEC quirk enabled.
>>
>> > +
>> > +   if (xhci->xhc_state & XHCI_STATE_DYING) {
>> > +           xhci_warn(xhci, "Abort the command ring,"
>> > +                           " but the xHCI is dead.\n");
>> > +           spin_unlock_irqrestore(&xhci->lock, flags);
>> > +           return -ESHUTDOWN;
>> > +   }
>> > +
>> > +   /* queue the cmd desriptor to cancelled_cd_list */
>> > +   retval = queue_cd(xhci, slot_id, cmd_type, ep_index);
>> > +   if (retval) {
>> > +           spin_unlock_irqrestore(&xhci->lock, flags);
>> > +           return retval;
>> > +   }
>> > +
>> > +   /* abort command ring */
>> > +   retval = xhci_abort_cmd_ring(xhci);
>> > +   if (retval) {
>> > +           xhci_err(xhci, "Abort command ring failed\n");
>> > +           spin_unlock_irqrestore(&xhci->lock, flags);
>> > +           if (retval == -ESHUTDOWN) {
>> > +                   usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
>> > +                   xhci_dbg(xhci, "xHCI host controller is dead.\n");
>> > +           }
>> > +           return retval;
>> > +   }
>> > +
>> > +   spin_unlock_irqrestore(&xhci->lock, flags);
>> > +   return 0;
>> > +}
>> > +
>> >  void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
>> >             unsigned int slot_id,
>> >             unsigned int ep_index,
>> > @@ -1050,6 +1152,127 @@ static int handle_cmd_in_cmd_wait_list(struct xhci_hcd *xhci,
>> >     return 1;
>> >  }
>> >
>> > +/*
>> > + * Find the command trb need to be cancelled and modify it to NO OP
>> > + * command.
>> > + *
>> > + * If we can't find the command trb, we think it had already been
>> > + * executed.
>> > + */
>> > +static void cd_to_noop(struct xhci_hcd *xhci, struct xhci_cd *cur_cd)
>> > +{
>> > +   struct xhci_segment *cur_seg;
>> > +   union xhci_trb *cmd_trb;
>> > +   int slot_id;
>> > +   u32 cmd_type;
>> > +   u32 cycle_state;
>> > +   int ep_index;
>> > +
>> > +   if (xhci->cmd_ring->dequeue == xhci->cmd_ring->enqueue)
>> > +           return;
>> > +
>> > +   /* find the current segmene of command ring */
>> > +   cur_seg = find_trb_seg(xhci->cmd_ring->first_seg,
>> > +                   xhci->cmd_ring->dequeue, &cycle_state);
>> > +
>> > +   /* find the command trb matched by CD from command ring */
>> > +   for (cmd_trb = xhci->cmd_ring->dequeue;
>> > +                   cmd_trb != xhci->cmd_ring->enqueue;
>> > +                   next_trb(xhci, xhci->cmd_ring, &cur_seg, &cmd_trb)) {
>> > +           if (TRB_TYPE_LINK_LE32(cmd_trb->generic.field[3]))
>> > +                   continue;
>> > +
>> > +           slot_id = TRB_TO_SLOT_ID(
>> > +                           le32_to_cpu(cmd_trb->generic.field[3]));
>> > +           cmd_type = TRB_FIELD_TO_TYPE(
>> > +                           le32_to_cpu(cmd_trb->generic.field[3]));
>> > +           if ((slot_id == cur_cd->slot_id) &&
>> > +                           (cmd_type == cur_cd->cmd_type)) {
>> > +
>> > +                   /* The ep_index in the command desriptors of Stop
>> > +                    * Endpoint command, Set TR Dequeue Pointer command
>> > +                    * and Reset Endpoint command is vaild. so it should
>> > +                    * be match the ep_index of cd.
>> > +                    */
>> > +                   if ((cmd_type == TRB_STOP_RING) ||
>> > +                                   (cmd_type == TRB_SET_DEQ) ||
>> > +                                   (cmd_type == TRB_RESET_EP)) {
>> > +                           ep_index = TRB_TO_EP_INDEX(le32_to_cpu(
>> > +                                           cmd_trb->generic.field[3]));
>> > +                           if (ep_index != cur_cd->ep_index)
>> > +                                   continue;
>> > +                   }
>> > +
>> > +                   /* get cycle state from the origin command trb */
>> > +                   cycle_state = le32_to_cpu(cmd_trb->generic.field[3])
>> > +                           & TRB_CYCLE;
>> > +
>> > +                   /* modify the command trb to NO OP command */
>> > +                   cmd_trb->generic.field[0] = 0;
>> > +                   cmd_trb->generic.field[1] = 0;
>> > +                   cmd_trb->generic.field[2] = 0;
>> > +                   cmd_trb->generic.field[3] = cpu_to_le32(
>> > +                                   TRB_TYPE(TRB_CMD_NOOP) | cycle_state);
>> > +                   break;
>> > +           }
>> > +   }
>
> This function is just disturbing.  What if there are multiple commands
> of the same type for the device?  You may end up setting the one you
> don't want to a no-op.
>
> Why not change all the callers of xhci_cancel_cmd() to pass in the
> pointer to the xhci_trb for that command?  They can save the xhci
> command ring enqueue pointer after they lock xhci->lock, and before they
> call into any command queueing function.  You need to set it to the
> first TRB in the next segment if the command ring dequeue pointer is
> pointing to a link TRB.
>
> Then you *know* which trb you're canceling, and you don't need to store
> the TRB type.  That will simplify this function quite a bit, because you
> won't need to walk the entire command ring.
>
>> > +}
>> > +
>> > +static void cancel_cd(struct xhci_hcd *xhci)
>
> Please rename this to xhci_cancel_cd.  We should have all functions
> (even if they're static) start with xhci.  I know I'm not completely
> consistent with this, but that is my preference.
>
>> > +{
>> > +   struct xhci_cd *cur_cd, *next_cd;
>> > +
>> > +   if (list_empty(&xhci->cancel_cmd_list))
>> > +           return;
>> > +
>> > +   list_for_each_entry_safe(cur_cd, next_cd,
>> > +                   &xhci->cancel_cmd_list, cancel_cmd_list) {
>> > +           xhci_dbg(xhci, "Cancel command, slot %d, cmd type %d, "
>> > +                           "ep_index = %d\n", cur_cd->slot_id,
>> > +                           cur_cd->cmd_type, cur_cd->ep_index);
>> > +           cd_to_noop(xhci, cur_cd);
>> > +
>> > +           /* Whatever we find the matched command trb, we need to
>> > +            * release the command descriptor.
>> > +            */
>> > +           list_del(&cur_cd->cancel_cmd_list);
>> > +           kfree(cur_cd);
>> > +   }
>> > +}
>
> What happens if there are multiple commands that need to be aborted, and
> commands other than the first one is in the waitlist?  You need to
> handle the fact that multiple aborted commands can be in the waitlist.
>
> Take this scenario, for instance:
>
> 1. khubd could be waiting on a Set Address command that has been queued
> to the command ring.  That command is NOT added to a command wait list.
>
> 2. As soon as the Set Address command is queued and the xhci->lock is
> dropped, the driver for a different device asks for an alternate
> interface setting change.  This causes a Configure Endpoint command to
> be queued to the command ring, and the command is added to the
> xhci_virt_device command wait list.
>
> 3. The Set Address times out, and we start the abort process.  The Set
> Address command is queued to the canceled commands list, and we write to
> the stop command register.
>
> 4. However, the host takes a long time to stop the command ring.  This
> allows the Configure Endpoint command to timeout, and be aborted.  The
> command ring is not stopped, so we queue the Configure Endpoint command
> to the canceled command list, write to the stop command register, and
> wait on the status register to report the command ring is stopped.
>
> 5. The command ring finally stops, and an event is generated with the
> Command Aborted status.  That brings us to handle_stopped_cmd_ring().
>
> 6. We handle the aborted Set Address command.  That command isn't
> associated with a command wait list, as I've stated.  *If* the virtual
> device command wait list is empty, handle_cmd_in_cmd_wait_list() will
> return safely without causing an NULL pointer dereferences when it tries
> to remove the command ring.  We'll move the dequeue pointer past the
> command, and move on to process the other commands in the list.
>
> 7. cancel_cd processes the aborted Configure Endpoint command.  It
> doesn't call handle_cmd_in_waitlist(), so the command will remain on the
> head of the xhci_virt_device wait list, and the command will never be
> freed.
>
> Once the abort finishes, many of the callers will poison the waitlist
> list_head and free the command memory.  So you should audit those and
> make sure they don't free the command before it is finished being
> aborted.
>
> Also, it's assumed that when a command is completed, the function that's
> waiting on that completion will free the xhci_command.  However, if that
> function has already returned, because the command ring has been
> stopped, it won't be able to free the command.  You'll need a new
> function that unconditionally frees the aborted commands' xhci_command
> structures.
>
>
> Ok, here's what I suggest you do.  First, modify xhci_cancel_cmd() to
> take an xhci_command argument.  Commands like Set Address that don't
> allocate an xhci_command will just pass a NULL pointer.  Store that
> pointer in the xhci_cd.  Add a new function, similar to
> handle_cmd_in_waitlist.  Perhaps call it handle_aborted_cmd_in_waitlist.
> Make it take a pointer to an xhci_cd.  That function should:
>
> 1. If the xhci_command pointer in xhci_cd is not NULL, remove the
> command from the wait list, and free that xhci_command.  The function
> that allocated the command has stopped waiting on the completion (that's
> why it cancelled the command in the first place), so there's no need to
> complete the competion in the xhci_command.
>
> 2. Unconditionally turn the command into a no-op TD.
>
> 3. Remove the command from the canceled command list, and free the
> list_head.
>
>
> On an unrelated note, do you have any code that removes the xhci_cd from
> the canceled command list if it completes before the command ring can be
> stopped?
>
> handle_cmd_completion() should probably look through the
> xhci->cancel_cmd_list for a matching command before the big switch
> statement.  If it finds a match, it should call
> handle_aborted_cmd_in_waitlist() and goto the inc_deq() call for the
> command ring, which will immediately return.  The code to look for a
> match is really simple if you store the TRB pointer in the xhci_cd like
> I suggested.
>
>> > +static void handle_stopped_cmd_ring(struct xhci_hcd *xhci,
>> > +           struct xhci_event_cmd *event)
>> > +{
>> > +   struct xhci_virt_device *virt_dev;
>> > +   int slot_id = TRB_TO_SLOT_ID(
>> > +                   le32_to_cpu(xhci->cmd_ring->dequeue->generic.field[3]));
>> > +   int cmd_trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>> > +
>> > +   virt_dev = xhci->devs[slot_id];
>> > +   if (virt_dev) {
>> > +           /* if the command is in cmd_list, handle it. */
>> > +           handle_cmd_in_cmd_wait_list(xhci, virt_dev, event);
>> > +   }
>> > +
>> > +   /* advance the dequeue pointer to next cmd trb */
>> > +   inc_deq(xhci, xhci->cmd_ring, false);
>
> Ok, I guess we might end up aborting a perfectly good command here.  I
> say that because there might be a command on the ring before the command
> that we want to abort.  We could somehow use the command stop bit
> instead of the command abort bit, but I don't think it's feasible.
>
> There's a race condition between when we decide whether to use the
> command stop bit or the command abort bit.  We could try to use the
> command stop bit if the command we need to abort is not the command that
> the command ring dequeue pointer is on.  But it's possible that the
> command before the one we want to abort has actually completed, but the
> event ring handler hasn't got to that event yet.
>
> Basically, we need to always use the command abort flag instead of the
> command stop flag.  That makes me sad, but such is life.
>
>> > +
>> > +   /* set the cmd ring state */
>> > +   if (cmd_trb_comp_code != COMP_CMD_STOP) {
>> > +           xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
>> > +   } else {
>> > +           cancel_cd(xhci);
>> > +           xhci->cmd_ring_state = CMD_RING_STATE_RUNNING;
>> > +
>> > +           /*
>> > +            * ring command ring doorbell again to handle the waiting
>> > +            * command trbs due to aborting command ring
>> > +            */
>> > +           if (xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue)
>> > +                   xhci_ring_cmd_db(xhci);
>> > +   }
>> > +}
>> > +
>
> [snip]
>
>> > @@ -1394,6 +1402,11 @@ struct xhci_hcd {
>> >     /* data structures */
>> >     struct xhci_device_context_array *dcbaa;
>> >     struct xhci_ring        *cmd_ring;
>> > +   unsigned int            cmd_ring_state;
>> > +#define CMD_RING_STATE_RUNNING             (1 << 0)
>> > +#define CMD_RING_STATE_ABORTED             (1 << 1)
>> > +#define CMD_RING_STATE_STOPPED             (1 << 2)
>>
>> I don't think you want a bit mask here, since you can't reside in more
>> than one of these states at any given point in time. Can it be a enum
>> instead?
>
> An enum would be fine, but we might need to change it in the future if
> it turns out we need concurrent states.
>
> Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux