Hi Sarah, When a missed service interval event is encountered, we just set the ep->skip flag, increase the event ring dequeue pointer, and then return. We will enter the do-while loop and process missed tds the next time we encounter a transfer event from this endpoint. If the next event after a missed service interval isn't for that endpoint, I think is OK. This endpoint does not have its skip flag set, and the do-while loop will run only once and then returns. If this endpoint's skip flag is also set, then we'll process its missed tds, since it must have a missed service interval event some time ago. If we get two missed service intervals for the same endpoint in a row, it will be two cases: the two missed service interval events are consecutive, or not. If they are consecutive, we just return and process the missed tds next time when it's not a missed service interval event; if they are not consecutive and there's a transfer event between them, we should process the missed tds for twice. If a missed service interval is followed by a Ring Underrun/overrun event for that isoc ring, there may be problems. In this case the handle_tx_event() should return. I think ep->skip flag should get cleared when encounter a Ring Underrun/overrun event. Maybe this causes Paul's issue? If Paul can print his event ring when the infinite loop case happens, that may help to find the actual case. Thanks & Best regards, Andiry > -----Original Message----- > From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx] > Sent: Wednesday, August 18, 2010 8:07 AM > To: Paul Zimmerman > Cc: linux-usb@xxxxxxxxxxxxxxx; Xu, Andiry > Subject: Re: [PATCH][RFC] xhci fixes > > Hi Andiry, > > I have a couple questions about your isoc patches. I think the do-while > loop in handle_tx_event() when a missed service interval is encountered > is trying to walk the event ring to find the next isoc TD that was > actually completed. It doesn't touch the event ring dequeue pointer > (except to move it past the missed service interval event). > > Now, what if the next event after a missed service interval isn't for > that endpoint? What if you get two missed service intervals for the > same endpoint in a row? Or a missed service interval followed by a Ring > Underrun/overrun event for that isoc ring? I think the code may have > bugs in those cases, although I'm not sure if those are the exact bugs > that Paul is running into. (Paul, I'll send you a debugging patch > shortly.) > > I think the while condition at the end of the do-while loop is > incorrect: > } while (ep->skip && trb_comp_code != COMP_MISSED_INT); > > trb_comp_code is only assigned once before the do-while loop, so it will > never change to break out of the loop once we hit a missed service > interval event. Only the clearing of ep->skip can make the loop stop. > But if the next event after the missed service interval isn't a transfer > event for that endpoint ring, then the loop will run indefinitely. > > Now, Paul's patch attempted to break out of the loop and return when > handle_tx_event() found a null event_seg when ep->skip was set for an > isochronous endpoint. It's possible the event ring and the endpoint > ring eventually synched up, and the isoc TDs that were skipped when the > handle_tx_event() function originally returned were never given back to > the driver. That might explain the null dequeue segment pointer after > the driver attempted to cancel the isochronous URBs. I'm still looking > at the code to figure out if it's possible though. > > Sarah Sharp > > On Mon, Aug 16, 2010 at 06:59:49PM -0700, Paul Zimmerman wrote: > > > From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx] > > > > > First, Andiry already sent a patch to update the dequeue pointer in > > > process_isoc_td(). That just got added to the master branch a couple > > > days ago. So you should take that bit of code out of this patch. > > > > Will do. > > > > > Those BUG() calls were there for a reason. Does the system work > without > > > them? > > > > Yes, it works without them, at least with the Synopsys controller. I > also tested > > a bit with the NEC controller and didn't see any problems. > > > > According to Linus, drivers should do WARN_ON() instead of BUG() in > interrupt > > context, unless it's absolutely impossible to continue. Sorry, I don't > have a > > reference to that email handy. > > > > > Can you break out the fixes for uninitialized variables into a > separate > > > patch? > > > > Will do. > > > > > Please break this into a separate patch. I see that bits 10:0 are > > > supposed to be the maximum packet size, which should make a bitmask of > > > 0x7ff, not 0x3ff. Thanks for catching that. > > > > Will do. > > > > > > @@ -131,7 +131,7 @@ static void next_trb(struct xhci_hcd *xhci, > > > > *seg = (*seg)->next; > > > > *trb = ((*seg)->trbs); > > > > } else { > > > > - *trb = (*trb)++; > > > > + (*trb)++; > > > > > > This part of the patch is from John's compiler fix patch, correct? > It's > > > already been sent to Greg, but hasn't made it into the master branch > yet > > > because Greg hasn't queued it. > > > > Yeah, I'll drop that too. By the way, that patch should probably go to > > -stable too, since it's undefined behavior in C, if I'm not mistaken. > > > > > > @@ -474,8 +474,11 @@ void xhci_find_new_dequeue_state(struct > xhci_hcd *xhci, > > > > state->new_deq_seg = find_trb_seg(cur_td->start_seg, > > > > dev->eps[ep_index].stopped_trb, > > > > &state->new_cycle_state); > > > > - if (!state->new_deq_seg) > > > > - BUG(); > > > > + if (!state->new_deq_seg) { > > > > + WARN_ON(1); > > > > + return; > > > > + } > > > > + > > > > > > This is not a very good solution. This function is called when the > > > driver wants to cancel a transfer, and the host is in the middle of > the > > > transfer. If you just return there, then the transfer will never get > > > taken off the hardware queue, and the hardware could potentially > access > > > freed (or reused) memory. Or the transfer will never get cancelled. > > > Really bad, either way. > > > > > > I'd like to figure out why exactly find_trb_seg() is returning NULL. > Do > > > you have any ideas? If not, I'll send you a debugging patch to get > more > > > info. > > > > Sorry, I don't have any ideas. That part of the xhci driver is black > magic > > to me. A debugging patch would be a good idea, I think. > > > > > > @@ -1551,6 +1556,10 @@ static int process_isoc_td(struct xhci_hcd > *xhci, struct xhci_td *td, > > > > /* calc actual length */ > > > > if (ep->skip) { > > > > td->urb->iso_frame_desc[idx].actual_length = 0; > > > > + /* Update ring dequeue pointer */ > > > > + while (ep_ring->dequeue != td->last_trb) > > > > + inc_deq(xhci, ep_ring, false); > > > > + inc_deq(xhci, ep_ring, false); > > > > return finish_td(xhci, td, event_trb, event, ep, status, > true); > > > > } > > > > > > As I said, this bit has been sent off to Greg. > > > > Will drop it. > > > > > > @@ -1839,7 +1850,6 @@ static int handle_tx_event(struct xhci_hcd > *xhci, > > > > xhci_dbg(xhci, "td_list is empty while skip > " > > > > "flag set. Clear skip flag.\n"); > > > > } > > > > - ret = 0; > > > > goto cleanup; > > > > } > > > > > > This bit should be in a separate patch. > > > > Will do. > > > > > > > > > @@ -1874,20 +1884,23 @@ static int handle_tx_event(struct xhci_hcd > *xhci, > > > > "Skip it\n"); > > > > goto cleanup; > > > > } > > > > - } > > > > > > > > - /* Now update the urb's actual_length and give back to > > > > - * the core > > > > - */ > > > > - if (usb_endpoint_xfer_control(&td->urb->ep->desc)) > > > > - ret = process_ctrl_td(xhci, td, event_trb, event, > ep, > > > > - &status); > > > > - else if (usb_endpoint_xfer_isoc(&td->urb->ep->desc)) > > > > - ret = process_isoc_td(xhci, td, event_trb, event, > ep, > > > > - &status); > > > > - else > > > > - ret = process_bulk_intr_td(xhci, td, event_trb, > event, > > > > - ep, &status); > > > > + /* Now update the urb's actual_length and give > back to > > > > + * the core > > > > + */ > > > > + if (usb_endpoint_xfer_control(&td->urb->ep->desc)) > > > > + ret = process_ctrl_td(xhci, td, event_trb, > event, ep, > > > > + &status); > > > > + else if (usb_endpoint_xfer_isoc(&td->urb->ep- > >desc)) > > > > + ret = process_isoc_td(xhci, td, event_trb, > event, ep, > > > > + &status); > > > > + else > > > > + ret = process_bulk_intr_td(xhci, td, > event_trb, event, > > > > + ep, &status); > > > > + } else { > > > > + xhci_dbg(xhci, "event_seg is NULL\n"); > > > > + return -ENODEV; > > > > + } > > > > > > I assume you did this part of the patch because handle_tx_event() was > > > infinite looping. I don't think it's a very good fix, and I'd rather > > > know the underlying cause for this. > > > > > > The fix looks a bit...odd. The only way the else statement would run > is > > > if this statement is false: > > > if (!event_seg && > > > (!ep->skip || > > > !usb_endpoint_xfer_isoc(&td->urb->ep->desc))) { > > > > > > or > > > if (!(!event_seg && (!ep->skip || !usb_endpoint_xfer_isoc(&td->urb- > >ep->desc)))) > > > if ((event_seg || !(!ep->skip || !usb_endpoint_xfer_isoc(&td->urb- > >ep->desc)))) > > > if ((event_seg || (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep- > >desc)))) > > > > > > We know the event_seg is NULL, so the skip flag was set and this was > an > > > isochronous endpoint. I wonder if the event_seg was NULL because of > the > > > lack of setting the dequeue pointer while processing the skipped TDs. > > > > > > Does the infinite looping go away when you apply *just* the patch to > fix > > > updating the dequeue pointer? What about the issues with triggering > the > > > BUG() calls while setting the TR dequeue pointer? > > > > No, Andiry's fix for the dequeue pointer didn't fix either of those. > > > > -- > > Paul > > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html