> -----Original Message----- > From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx] > Sent: Friday, August 05, 2011 2:45 AM > To: Xu, Andiry > Cc: linux-usb@xxxxxxxxxxxxxxx > Subject: [RFC 2/3] xhci: Fix failed enqueue in the middle of isoch TD. > > When an isochronous transfer is enqueued, xhci_queue_isoc_tx_prepare() > will ensure that there is enough room on the transfer rings for all of > the > isochronous TDs for that URB. However, when xhci_queue_isoc_tx() is > enqueueing individual isoc TDs, the prepare_transfer() function can > fail > if the endpoint state has changed to disabled, error, or some other > unknown state. > > With the current code, if Nth TD (not the first TD) fails, the ring is > left in a sorry state. The partially enqueued TDs are left on the ring, > and the first TRB of the TD is not given back to the hardware. The > enqueue pointer is left on the TRB after the last successfully enqueued > TD. This means the ring is basically useless. Any new transfers will > be > enqueued after the failed TDs, which the hardware will never read > because > the cycle bit indicates it does not own them. The ring will fill up > with > untransferred TDs, and the endpoint will be basically unusable. > > The untransferred TDs will also remain on the TD list. Since the > td_list > is a FIFO, this basically means the ring handler will be waiting on TDs > that will never be completed (or worse, dereference memory that doesn't > exist any more). > > Change the code to clean up the isochronous ring after a failed > transfer. > If the first TD failed, simply return and allow the xhci_urb_enqueue > function to free the urb_priv. If the Nth TD failed, first remove the > TDs > from the td_list. Then convert the TRBs that were enqueued into No-op > TRBs. Make sure to flip the cycle bit on all enqueued TRBs (including > any > link TRBs in the middle or between TDs), but leave the cycle bit of the > first TRB (which will show software-owned) intact. Then move the ring > enqueue pointer back to the first TRB and make sure to change the > xhci_ring's cycle state to what is appropriate for that ring segment. > > This ensures that the No-op TRBs will be overwritten by subsequent TDs, > and the hardware will not start executing random TRBs because the cycle > bit was left as hardware-owned. > > This bug is unlikely to be hit, but it was something I noticed while > tracking down the watchdog timer issue. I verified that the fix works > by > injecting some errors on the 250th isochronous URB queued, although I > could not verify that the ring is in the correct state because uvcvideo > refused to talk to the device after the first usb_submit_urb() failed. > Ring debugging shows that the ring looks correct, however. > > This patch should be backported to kernels as old as 2.6.36. > > Signed-off-by: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> > Cc: Andiry Xu <andiry.xu@xxxxxxx> > --- > drivers/usb/host/xhci-ring.c | 49 > ++++++++++++++++++++++++++++++++++++----- > 1 files changed, 43 insertions(+), 6 deletions(-) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci- > ring.c > index 9d3f9dd..5b55ba3 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -514,8 +514,11 @@ void xhci_find_new_dequeue_state(struct xhci_hcd > *xhci, > (unsigned long long) addr); > } > > +/* flip_cycle means flip the cycle bit of all but the first and last > TRB. > + * This is used to remove partially enqueued isoc TDs from a ring. > + */ > static void td_to_noop(struct xhci_hcd *xhci, struct xhci_ring > *ep_ring, > - struct xhci_td *cur_td) > + struct xhci_td *cur_td, bool flip_cycle) > { > struct xhci_segment *cur_seg; > union xhci_trb *cur_trb; > @@ -528,6 +531,12 @@ static void td_to_noop(struct xhci_hcd *xhci, > struct xhci_ring *ep_ring, > * leave the pointers intact. > */ > cur_trb->generic.field[3] &= cpu_to_le32(~TRB_CHAIN); > + /* Flip the cycle bit (link TRBs can't be the first > + * or last TRB). > + */ > + if (flip_cycle) > + cur_trb->generic.field[3] ^= > + cpu_to_le32(TRB_CYCLE); > xhci_dbg(xhci, "Cancel (unchain) link TRB\n"); > xhci_dbg(xhci, "Address = %p (0x%llx dma); " > "in seg %p (0x%llx dma)\n", > @@ -541,6 +550,11 @@ static void td_to_noop(struct xhci_hcd *xhci, > struct xhci_ring *ep_ring, > cur_trb->generic.field[2] = 0; > /* Preserve only the cycle bit of this TRB */ > cur_trb->generic.field[3] &= cpu_to_le32(TRB_CYCLE); > + /* Flip the cycle bit except on the first or last TRB > */ > + if (flip_cycle && cur_trb != cur_td->first_trb && > + cur_trb != cur_td->last_trb) > + cur_trb->generic.field[3] ^= > + cpu_to_le32(TRB_CYCLE); In the patch description you mention "but leave the cycle bit of the first TRB (which will show software-owned) intact", but here it leaves both the first_trb and last_trb intact. I think that is because here the cur_td->last_trb actually point to the trb pass the real last trb of the td, right? From driver view, that is correct I think, but maybe this casuses some sort of confusion because here the cur_td->last_trb does not really belong to the td. Please correct me if I'm wrong. > cur_trb->generic.field[3] |= cpu_to_le32( > TRB_TYPE(TRB_TR_NOOP)); > xhci_dbg(xhci, "Cancel TRB %p (0x%llx dma) " > @@ -719,7 +733,7 @@ static void handle_stopped_endpoint(struct xhci_hcd > *xhci, > cur_td->urb->stream_id, > cur_td, &deq_state); > else > - td_to_noop(xhci, ep_ring, cur_td); > + td_to_noop(xhci, ep_ring, cur_td, false); > remove_finished_td: > /* > * The event handler won't see a completion for this TD > anymore, > @@ -3223,6 +3237,7 @@ static int xhci_queue_isoc_tx(struct xhci_hcd > *xhci, gfp_t mem_flags, > start_trb = &ep_ring->enqueue->generic; > start_cycle = ep_ring->cycle_state; > > + urb_priv = urb->hcpriv; > /* Queue the first TRB, even if it's zero-length */ > for (i = 0; i < num_tds; i++) { > unsigned int total_packet_count; > @@ -3246,12 +3261,13 @@ static int xhci_queue_isoc_tx(struct xhci_hcd > *xhci, gfp_t mem_flags, > > ret = prepare_transfer(xhci, xhci->devs[slot_id], ep_index, > urb->stream_id, trbs_per_td, urb, i, mem_flags); > - if (ret < 0) > - return ret; > + if (ret < 0) { > + if (i == 0) > + return ret; > + goto cleanup; > + } > > - urb_priv = urb->hcpriv; > td = urb_priv->td[i]; > - > for (j = 0; j < trbs_per_td; j++) { > u32 remainder = 0; > field = TRB_TBC(burst_count) | TRB_TLBPC(residue); > @@ -3341,6 +3357,27 @@ static int xhci_queue_isoc_tx(struct xhci_hcd > *xhci, gfp_t mem_flags, > giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id, > start_cycle, start_trb); > return 0; > +cleanup: > + /* Clean up a partially enqueued isoc transfer. */ > + > + for (i--; i >= 0; i--) > + list_del(&urb_priv->td[i]->td_list); > + > + /* Use the first TD as a temporary variable to turn the TDs we've > queued > + * into No-ops with a software-owned cycle bit. That way the > hardware > + * won't accidentally start executing bogus TDs when we partially > + * overwrite them. td->first_trb and td->start_seg are already > set. > + */ > + urb_priv->td[0]->last_trb = ep_ring->enqueue; > + /* Every TRB except the first & last will have its cycle bit > flipped. */ > + td_to_noop(xhci, ep_ring, urb_priv->td[0], true); > + > + /* Reset the ring enqueue back to the first TRB and its cycle bit. > */ > + ep_ring->enqueue = urb_priv->td[0]->first_trb; > + ep_ring->enq_seg = urb_priv->td[0]->start_seg; > + ep_ring->cycle_state = start_cycle; > + usb_hcd_unlink_urb_from_ep(bus_to_hcd(urb->dev->bus), urb); > + return ret; > } > > /* Thanks, Andiry -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html