On Tue, May 10, 2011 at 11:36:48AM +0800, He, Alex wrote: > Hi Sarah, > > Thanks for your correction and suggestion. > > I am sorry for that I don't give a very clear description for FSE patch. > Let me to explain it again with a special scenario. > 1. There are two TDs in td_list that will be transferred. > 2. The first TD was transferred successfully reported by an > transfer event with COMP_SUCCESS. The xHCI driver called > finish_td() to remove the first TD from the td_list and increase > the ep_ring->dequeue and ep_ring->deq_seg to the next TD. But > the HW internal ep ring dequeue maintained by xHC may stay at > the last TRB of the first TD before start the next transfer for > the second TD. Ok, at first I wondered how this could ever happen, since it kind of sounds like a hardware bug, but it's documented in Table 3 of the xHCI spec, so I guess it's legitimate behavior. > 3. The stop endpoint command run just after the first TD being > transferred successfully. It means that an > FSE(CC=COMP_STOP_INVAL) queued after the transfer > event(CC=COMP_SUCCESS) for the first TD. The FSE uses the HW > internal ep ring dequeue which point to the last TRB of the > first TD. And the last TRB of the first TD is the event_trb. > > 4. Then driver called handle_tx_event() again to process the > FSE. It will try to find out the event_seg which include the > event_trb of the FSE by the line: event_seg = > trb_in_td(ep_ring->deq_seg, ep_ring->dequeue, td->last_trb, > event_dma) It will fail with NULL return value because that the > ep_ring->deq_seg and ep_ring->dequeue move to the second TD. > i.e. you can't find out the last TRB of the first TD in the > second TD. And this make the driver go to an wrong way with > "return -ESHUTDWON". > > Another complicate case will occur when there an Link TD between the two > TDs. Let's take a step back. For the cancellation code to work properly, we need to know where the hardware dequeue pointer is. From the xHCI driver's perspective, the previous TD is already done, and the xHCI driver's ring dequeue pointer has been moved past that TD (and any link TRBs that follow). All we really care about is whether the HW's dequeue pointer is somewhere in the middle of the next TD, so we can decide whether to turn the TD into no-ops or use a Set TR dequeue pointer. If the HW is actually stopped before the TD, it would be fine to turn the TD into no-ops, right? What happens if we just ignore the FSE event for the last TRB of the first TD? ep->stopped_td should be NULL after the last Stop Endpoint command was handled. When we're deciding whether to turn the canceled TD into no-ops or issue a Set TR dequeue pointer, we'll compare that TD to ep->stopped_td, which won't match, and just turn that TD into no-ops. That behavior should be fine, since the xHC is supposed to clear any cached TRBs when a Stop Endpoint command is issued, and the xHC will just go on to process the no-op TRBs. I think it might be cleaner to just ignore the FSE, by checking for !event_seg && comp_code == COMP_STOP_INVAL, rather than keeping track of the previous TD. This also avoids a bad pointer dereference with this code if the previous TD has already been freed: + if (!event_seg) { + event_seg = trb_in_td(prev_td->start_seg, + prev_td->first_trb, + td->first_trb, + event_dma); + if (event_seg && (trb_comp_code == COMP_STOP_INVAL)) + if (is_force_stopeed_event(event_seg, + prev_td->last_trb, + event_dma)) { + ret = 0; + goto cleanup; + } You dereference prev_td in the call to is_force_stopeed_event(). That's bad because it might have been freed when the successful event TD for the first TD was processed. I don't think you should try to prevent the xhci_td from being freed when an URB is given back, so I think the simpler method is to just ignore the FSE event. I'm also not certain there is a bug, but the next bit of code also checks for a null event_seg, and I wonder what the interaction between the two will be: if (!event_seg) { if (!ep->skip || !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) { /* HC is busted, give up! */ xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not " "part of current TD\n"); return -ESHUTDOWN; } ret = skip_isoc_td(xhci, td, event, ep, &status); goto cleanup; } It's probably better if you check for both conditions (!event_seg && comp_code == COMP_STOP_INVAL) before this block. > So I think we need an patch for these cases like that: > 1. Record the TD as prev_td in ep_ring before delete it from the > td_list. I don't think this is the right approach, because the xhci_td may get freed. You could prevent it from being freed until the next TD gets completed, but then you'll have to add extra checking to the bandwidth code to make sure and delete the last recorded TD. That way seems more complicated and prone to introducing bugs. > 2. Find the FSE event_trb in the prev_td or between the prev_td > and cur_td > for it maybe a link TD. > 3. Just jump over the FSE in the event ring because the prev > transfer event > have been processed. > > That's all my think about the FSE. It's too hard to figure out them. :( Thank you for the explanation, that's much clearer than the patch description. I would suggest you put something like it in the next patch description. Don't be afraid to send really long descriptions if it's a really complicated bug, even if the patch description turns out longer than the fix. :) > BTW: I will refine the other patches and commit them later. Sure, thank you. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html