On Wed, 2011-05-11 at 01:11 +0800, Sarah Sharp wrote: > On Tue, May 10, 2011 at 11:36:48AM +0800, He, Alex wrote: > > Hi Sarah, > > > > Thanks for your correction and suggestion. > > > > I am sorry for that I don't give a very clear description for FSE > patch. > > Let me to explain it again with a special scenario. > > 1. There are two TDs in td_list that will be transferred. > > > 2. The first TD was transferred successfully reported by an > > transfer event with COMP_SUCCESS. The xHCI driver called > > finish_td() to remove the first TD from the td_list and > increase > > the ep_ring->dequeue and ep_ring->deq_seg to the next TD. But > > the HW internal ep ring dequeue maintained by xHC may stay at > > the last TRB of the first TD before start the next transfer > for > > the second TD. > > Ok, at first I wondered how this could ever happen, since it kind of > sounds like a hardware bug, but it's documented in Table 3 of the xHCI > spec, so I guess it's legitimate behavior. I am also confused about that since the spec doesn't define the FSE clearly. But we need to process it for some hardware implemention. > > > 3. The stop endpoint command run just after the first TD being > > transferred successfully. It means that an > > FSE(CC=COMP_STOP_INVAL) queued after the transfer > > event(CC=COMP_SUCCESS) for the first TD. The FSE uses the HW > > internal ep ring dequeue which point to the last TRB of the > > first TD. And the last TRB of the first TD is the event_trb. > > > > 4. Then driver called handle_tx_event() again to process the > > FSE. It will try to find out the event_seg which include the > > event_trb of the FSE by the line: event_seg = > > trb_in_td(ep_ring->deq_seg, ep_ring->dequeue, td->last_trb, > > event_dma) It will fail with NULL return value because that > the > > ep_ring->deq_seg and ep_ring->dequeue move to the second TD. > > i.e. you can't find out the last TRB of the first TD in the > > second TD. And this make the driver go to an wrong way with > > "return -ESHUTDWON". > > > > Another complicate case will occur when there an Link TD between the > two > > TDs. > > Let's take a step back. For the cancellation code to work properly, > we > need to know where the hardware dequeue pointer is. From the xHCI > driver's perspective, the previous TD is already done, and the xHCI > driver's ring dequeue pointer has been moved past that TD (and any > link > TRBs that follow). > > All we really care about is whether the HW's dequeue pointer is > somewhere in the middle of the next TD, so we can decide whether to > turn > the TD into no-ops or use a Set TR dequeue pointer. If the HW is > actually stopped before the TD, it would be fine to turn the TD into > no-ops, right? Yes. > > What happens if we just ignore the FSE event for the last TRB of the > first TD? ep->stopped_td should be NULL after the last Stop Endpoint > command was handled. When we're deciding whether to turn the canceled > TD into no-ops or issue a Set TR dequeue pointer, we'll compare that > TD > to ep->stopped_td, which won't match, and just turn that TD into > no-ops. > That behavior should be fine, since the xHC is supposed to clear any > cached TRBs when a Stop Endpoint command is issued, and the xHC will > just go on to process the no-op TRBs. > > I think it might be cleaner to just ignore the FSE, by checking for > !event_seg && comp_code == COMP_STOP_INVAL, rather than keeping track > of > the previous TD. This also avoids a bad pointer dereference with this > code if the previous TD has already been freed: > > + if (!event_seg) { > + event_seg = trb_in_td(prev_td->start_seg, > + prev_td->first_trb, > + td->first_trb, > + event_dma); > + if (event_seg && (trb_comp_code == > COMP_STOP_INVAL)) > + if (is_force_stopeed_event(event_seg, > + > prev_td->last_trb, > + event_dma)) { > + ret = 0; > + goto cleanup; > + } > > You dereference prev_td in the call to is_force_stopeed_event(). > That's > bad because it might have been freed when the successful event TD for > the first TD was processed. I don't think you should try to prevent > the > xhci_td from being freed when an URB is given back, so I think the > simpler method is to just ignore the FSE event. > I'm also not certain there is a bug, but the next bit of code also > checks for a null event_seg, and I wonder what the interaction between > the two will be: > > if (!event_seg) { > if (!ep->skip || > ! > usb_endpoint_xfer_isoc(&td->urb->ep->desc)) { > /* HC is busted, give up! */ > xhci_err(xhci, > "ERROR Transfer event TRB DMA > ptr not " > "part of current TD\n"); > return -ESHUTDOWN; > } > > ret = skip_isoc_td(xhci, td, event, ep, > &status); > goto cleanup; > } > > It's probably better if you check for both conditions (!event_seg && > comp_code == COMP_STOP_INVAL) before this block. > I'd thought of this conditions too but not sure if it is enough for FSE. That is one reason for me to chose the prev_td do that which introduce the crisis as you said. By comparison, the method of your is the better one. > > So I think we need an patch for these cases like that: > > 1. Record the TD as prev_td in ep_ring before delete it from > the > > td_list. > > I don't think this is the right approach, because the xhci_td may get > freed. You could prevent it from being freed until the next TD gets > completed, but then you'll have to add extra checking to the bandwidth > code to make sure and delete the last recorded TD. That way seems > more > complicated and prone to introducing bugs. > > > 2. Find the FSE event_trb in the prev_td or between the > prev_td > > and cur_td > > for it maybe a link TD. > > 3. Just jump over the FSE in the event ring because the prev > > transfer event > > have been processed. > > > > That's all my think about the FSE. It's too hard to figure out > them. :( > > Thank you for the explanation, that's much clearer than the patch > description. I would suggest you put something like it in the next > patch description. Don't be afraid to send really long descriptions > if > it's a really complicated bug, even if the patch description turns out > longer than the fix. :) One more question. Will the second TD be added to the cancelled_td_list for the FSE? If it is not true, when the driver call handle_stopped_endpoint(), the condition (cur_td == ep->stopped_td) is false. The xhci_find_new_dequeue_state() won't be called and the deq_state.new_deq_ptr and deq_state_new_deq_seg are still zero. Then the ring_doorbell_for_active_rings() be called and the ep_ring run again. In other words the stop endpoint command can't stop the ep for the FSE case. Thanks, Alex He -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html