RE: [RFC 2/3] xhci: Fix failed enqueue in the middle of isoch TD.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx]
> Sent: Friday, August 05, 2011 2:45 AM
> To: Xu, Andiry
> Cc: linux-usb@xxxxxxxxxxxxxxx
> Subject: [RFC 2/3] xhci: Fix failed enqueue in the middle of isoch TD.
> 
> When an isochronous transfer is enqueued, xhci_queue_isoc_tx_prepare()
> will ensure that there is enough room on the transfer rings for all of
> the
> isochronous TDs for that URB.  However, when xhci_queue_isoc_tx() is
> enqueueing individual isoc TDs, the prepare_transfer() function can
> fail
> if the endpoint state has changed to disabled, error, or some other
> unknown state.
> 
> With the current code, if Nth TD (not the first TD) fails, the ring is
> left in a sorry state.  The partially enqueued TDs are left on the
ring,
> and the first TRB of the TD is not given back to the hardware.  The
> enqueue pointer is left on the TRB after the last successfully
enqueued
> TD.  This means the ring is basically useless.  Any new transfers will
> be
> enqueued after the failed TDs, which the hardware will never read
> because
> the cycle bit indicates it does not own them.  The ring will fill up
> with
> untransferred TDs, and the endpoint will be basically unusable.
> 
> The untransferred TDs will also remain on the TD list.  Since the
> td_list
> is a FIFO, this basically means the ring handler will be waiting on
TDs
> that will never be completed (or worse, dereference memory that
doesn't
> exist any more).
> 
> Change the code to clean up the isochronous ring after a failed
> transfer.
> If the first TD failed, simply return and allow the xhci_urb_enqueue
> function to free the urb_priv.  If the Nth TD failed, first remove the
> TDs
> from the td_list.  Then convert the TRBs that were enqueued into No-op
> TRBs.  Make sure to flip the cycle bit on all enqueued TRBs (including
> any
> link TRBs in the middle or between TDs), but leave the cycle bit of
the
> first TRB (which will show software-owned) intact.  Then move the ring
> enqueue pointer back to the first TRB and make sure to change the
> xhci_ring's cycle state to what is appropriate for that ring segment.
> 
> This ensures that the No-op TRBs will be overwritten by subsequent
TDs,
> and the hardware will not start executing random TRBs because the
cycle
> bit was left as hardware-owned.
> 
> This bug is unlikely to be hit, but it was something I noticed while
> tracking down the watchdog timer issue.  I verified that the fix works
> by
> injecting some errors on the 250th isochronous URB queued, although I
> could not verify that the ring is in the correct state because
uvcvideo
> refused to talk to the device after the first usb_submit_urb() failed.
> Ring debugging shows that the ring looks correct, however.
> 
> This patch should be backported to kernels as old as 2.6.36.
> 
> Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
> Cc: Andiry Xu <andiry.xu@xxxxxxx>
> ---
>  drivers/usb/host/xhci-ring.c |   49
> ++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
> ring.c
> index 9d3f9dd..5b55ba3 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -514,8 +514,11 @@ void xhci_find_new_dequeue_state(struct xhci_hcd
> *xhci,
>  			(unsigned long long) addr);
>  }
> 
> +/* flip_cycle means flip the cycle bit of all but the first and last
> TRB.
> + * This is used to remove partially enqueued isoc TDs from a ring.
> + */
>  static void td_to_noop(struct xhci_hcd *xhci, struct xhci_ring
> *ep_ring,
> -		struct xhci_td *cur_td)
> +		struct xhci_td *cur_td, bool flip_cycle)
>  {
>  	struct xhci_segment *cur_seg;
>  	union xhci_trb *cur_trb;
> @@ -528,6 +531,12 @@ static void td_to_noop(struct xhci_hcd *xhci,
> struct xhci_ring *ep_ring,
>  			 * leave the pointers intact.
>  			 */
>  			cur_trb->generic.field[3] &=
cpu_to_le32(~TRB_CHAIN);
> +			/* Flip the cycle bit (link TRBs can't be the
first
> +			 * or last TRB).
> +			 */
> +			if (flip_cycle)
> +				cur_trb->generic.field[3] ^=
> +					cpu_to_le32(TRB_CYCLE);
>  			xhci_dbg(xhci, "Cancel (unchain) link TRB\n");
>  			xhci_dbg(xhci, "Address = %p (0x%llx dma); "
>  					"in seg %p (0x%llx dma)\n",
> @@ -541,6 +550,11 @@ static void td_to_noop(struct xhci_hcd *xhci,
> struct xhci_ring *ep_ring,
>  			cur_trb->generic.field[2] = 0;
>  			/* Preserve only the cycle bit of this TRB */
>  			cur_trb->generic.field[3] &=
cpu_to_le32(TRB_CYCLE);
> +			/* Flip the cycle bit except on the first or
last TRB
> */
> +			if (flip_cycle && cur_trb != cur_td->first_trb
&&
> +					cur_trb != cur_td->last_trb)
> +				cur_trb->generic.field[3] ^=
> +					cpu_to_le32(TRB_CYCLE);

In the patch description you mention "but leave the cycle bit of the
first TRB (which will show software-owned) intact", but here it leaves
both the first_trb and last_trb intact. I think that is because here the
cur_td->last_trb actually point to the trb pass the real last trb of the
td, right? From driver view, that is correct I think, but maybe this
casuses some sort of confusion because here the cur_td->last_trb does
not really belong to the td. Please correct me if I'm wrong.

>  			cur_trb->generic.field[3] |= cpu_to_le32(
>  				TRB_TYPE(TRB_TR_NOOP));
>  			xhci_dbg(xhci, "Cancel TRB %p (0x%llx dma) "
> @@ -719,7 +733,7 @@ static void handle_stopped_endpoint(struct
xhci_hcd
> *xhci,
>  					cur_td->urb->stream_id,
>  					cur_td, &deq_state);
>  		else
> -			td_to_noop(xhci, ep_ring, cur_td);
> +			td_to_noop(xhci, ep_ring, cur_td, false);
>  remove_finished_td:
>  		/*
>  		 * The event handler won't see a completion for this TD
> anymore,
> @@ -3223,6 +3237,7 @@ static int xhci_queue_isoc_tx(struct xhci_hcd
> *xhci, gfp_t mem_flags,
>  	start_trb = &ep_ring->enqueue->generic;
>  	start_cycle = ep_ring->cycle_state;
> 
> +	urb_priv = urb->hcpriv;
>  	/* Queue the first TRB, even if it's zero-length */
>  	for (i = 0; i < num_tds; i++) {
>  		unsigned int total_packet_count;
> @@ -3246,12 +3261,13 @@ static int xhci_queue_isoc_tx(struct xhci_hcd
> *xhci, gfp_t mem_flags,
> 
>  		ret = prepare_transfer(xhci, xhci->devs[slot_id],
ep_index,
>  				urb->stream_id, trbs_per_td, urb, i,
mem_flags);
> -		if (ret < 0)
> -			return ret;
> +		if (ret < 0) {
> +			if (i == 0)
> +				return ret;
> +			goto cleanup;
> +		}
> 
> -		urb_priv = urb->hcpriv;
>  		td = urb_priv->td[i];
> -
>  		for (j = 0; j < trbs_per_td; j++) {
>  			u32 remainder = 0;
>  			field = TRB_TBC(burst_count) |
TRB_TLBPC(residue);
> @@ -3341,6 +3357,27 @@ static int xhci_queue_isoc_tx(struct xhci_hcd
> *xhci, gfp_t mem_flags,
>  	giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id,
>  			start_cycle, start_trb);
>  	return 0;
> +cleanup:
> +	/* Clean up a partially enqueued isoc transfer. */
> +
> +	for (i--; i >= 0; i--)
> +		list_del(&urb_priv->td[i]->td_list);
> +
> +	/* Use the first TD as a temporary variable to turn the TDs
we've
> queued
> +	 * into No-ops with a software-owned cycle bit. That way the
> hardware
> +	 * won't accidentally start executing bogus TDs when we
partially
> +	 * overwrite them.  td->first_trb and td->start_seg are already
> set.
> +	 */
> +	urb_priv->td[0]->last_trb = ep_ring->enqueue;
> +	/* Every TRB except the first & last will have its cycle bit
> flipped. */
> +	td_to_noop(xhci, ep_ring, urb_priv->td[0], true);
> +
> +	/* Reset the ring enqueue back to the first TRB and its cycle
bit.
> */
> +	ep_ring->enqueue = urb_priv->td[0]->first_trb;
> +	ep_ring->enq_seg = urb_priv->td[0]->start_seg;
> +	ep_ring->cycle_state = start_cycle;
> +	usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb->dev->bus), urb);
> +	return ret;
>  }
> 
>  /*


Thanks,
Andiry

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux