Alex, see my answers below. Alan, All of this is a lot of work for the xHCI driver, and it seems like other host drivers also could suffer from active URBs being outstanding at inopportune times due to buggy drivers. Would it be possible for the USB core to cancel any outstanding URBs before resetting a device, changing a configuration or alternate interface setting, or deallocating a device? It seems like that work really needs to be done in a central location. On Mon, Jun 20, 2011 at 06:33:22PM +0800, Alex He wrote: > Hi Sarah, > > I have some questions about it. You know I don't know xhci driver well. > Please correct me if I was wrong. > > > On Sat, 2011-06-18 at 01:12 +0800, Sarah Sharp wrote: > > Hi Alex, > > > > I thought at first that this patch was correct, but looking at it > > further, I think this patch will leak URBs. > > > > On Wed, Jun 08, 2011 at 06:25:50PM +0800, Alex He wrote: > > > /* Don't ring the doorbell for this endpoint if there are > > pending > > > * cancellations because we don't want to interrupt > > processing. > > > @@ -1995,6 +2005,10 @@ static int handle_tx_event(struct xhci_hcd > > *xhci, > > > case COMP_BUFF_OVER: > > > xhci_warn(xhci, "WARN: buffer overrun event on > > endpoint\n"); > > > break; > > > + case COMP_EBADEP: > > > + xhci_warn(xhci, "WARN: Doorbell rung for disabled slot > > " > > > + "or endpoint\n"); > > > + goto cleanup; > > > > What happens if we've placed a TD on a disabled endpoint ring and rung > > the doorbell? If you go directly down to cleanup, ret will still be > > zero, and the TD will remain on the TD list: > > I saw the xhci_urb_enqueue() (call prepare_ring()) has checked the > ep_state to prevent to add TD to ep_ring for DISABLED EP. Yes, but there's a race condition between the hardware and software state. The configure endpoint command can race with URB submission. Say a driver submits an URB for a bulk endpoint, and then immediately requests a device reset. The xHCI driver will see the endpoint state is enabled, ring the doorbell, and then go on to process the device reset request by issuing a Reset Device command. Since bulk endpoints are best effort, it's possible, although not very plausible that the host controller could process the command before servicing the bulk endpoint ring. In that case, the host finds the endpoint is disabled and gives us the Disabled Endpoint completion code. I would have to think about whether such a race condition could actually *happen* with the current code. We had a discussion a while back that drivers are not required to cancel URBs before requesting a device reset. Alan Stern amended the kernel documentation to say that they should, but it's possible some driver could trigger this race condition. > > The ring code will get stuck if you leave the TD on the td_list, and > > the > > driver will eventually ask you to cancel the URB, and then we get into > > further issues with the cancellation code and disabled endpoints. I > > think you need to allow the code to try to find the TD and its URB, so > > that it can be given back to the driver with an error return code. > > Possibly -EPROTO? We don't want to return -ENODEV, because the device > > might still be there, just not that particular endpoint. > > > > I think to fix this, you need to add handling for the error code to > > the > > various process_*_td functions that sets the status code (or the frame > > status code for isochronous endpoints) to -EPROTO. > > > The section 4.7 of the xHCI spec says that the Endpoint Not Enable Error > Event TRB with TRB Pointer, TRB Transfer Length, Event Data(ED) fields > set to '0'. It means that this Event is not interrelated with any TDs > directly. So we don't know the factual situation of the ep_ring, i.e. > which URB of the TD can be used to return -EPROTO? Can it be the first > TD in the ep_ring->td_list or all TDs in the ep_ring->td_list even they > hasn't be transfered? I would say if the endpoint is still present at all, we need to give back -ENOENT for all the TDs on the td_list. But after thinking about this further, I'm not sure it's appropriate to giveback those URBs in the ring handling code. I would have to think about it really hard to figure out if there are any race conditions there, since we couldn't actually submit a stop endpoint command to take the TDs off the ring. The only way an endpoint can transition to the disabled state is if the driver has submitted a Configure Endpoint, a Reset Device, or a Disable Slot command. I think any URBs left on the endpoint rings should be handled in the completions for those commands (and a big fat warning should be printed if URBs are active for those rings). When endpoints are dropped or changed by the Configure Endpoint command, the xHCI driver installs a new endpoint ring and calls xhci_free_or_cache_endpoint_ring(). You should can give back any URBs in the td_list with -ENOENT in there. You'll need to both giveback the URB, and ensure proper ref counting for any isochronous endpoints (decrementing bandwidth_isoc_reqs with xhci->lock held, since it isn't an atomic variable) like xhci_giveback_urb_in_irq() does. You can't actually use that function, since it assumes you're in an interrupt context and uses spin_lock() instead of spin_lock_irqsave(), and the bandwidth function runs in process context. But you could refactor the function a bit so that it calls a different function that has the option of using either interrupt or process type locking. That function would have the bulk of the code in xhci_giveback_urb_in_irq(), with an if statement when it comes to giveback the URB. Once a Reset Device command completes, the driver also calls xhci_free_or_cache_endpoint_ring(). We also handle the case where the endpoint had streams enabled, so xhci_free_stream_info() probably needs to be modified to giveback any URBs that are on each of the stream ring's td_lists. The Disable Slot command is a bit harder to handle, since it completes only in interrupt context, and calls xhci_free_virt_device(), which is called from both interrupt and process context. Probably you should giveback any URBs before submitting the command, since that command can't fail unless we have already disabled the slot. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html