Hi Alan and Sarah,
Sorry for my late response but i had some personal issues that held me
back for a while.
My comments follow below.
On 10/17/2013 06:11 PM, Alan Stern wrote:
On Wed, 16 Oct 2013, Sarah Sharp wrote:
I think there's some nasty race conditions here. There's several
different structures this code and other functions are manipulating:
- the endpoint ring contents
- the endpoint ring dequeue pointer
- the endpoint's cur_td and cancelled_td lists
The other functions you need to look at are xhci_urb_enqueue,
xhci_urb_dequeue, and xhci_handle_cmd_stop_ep. Arguably, we should be
doing something about drivers attempting to change alternate interface
settings at the same time they're resetting endpoints, but I think the
solution should be that drivers just shouldn't do that. However, we do
need to handle the case where the reset endpoint races with URB
cancellation and enqueue.
You can simplify part of the problem by not allowing an endpoint to be
reset if it has any pending URBs. Just fail the reset in this case.
Yes, you are right since, from what i understand, it is the
responsibility of the device driver to unlink any pending URBs before
issuing a clear halt to the endpoint. So we expect the device driver to
call usb_unlink_urb() for each pending URB, which in turn calls
xhci_urb_dequeue(). The function xhci_urb_dequeue() will add the URB to
endpoint's cancelled_td_list and will issue a Stop Endpoint command. The
completion handler of the Stop Endpoint command,
xhci_handle_cmd_stop_ep(), will eventually remove the unlinked URBs from
endpoint's both cancelled_td_list and td_list and give them back to core.
However, since xhci_urb_dequeue() does not wait on the completion of the
Stop Endpoint command, it is still possible for the URB not to have been
removed yet from the td_list when the xhci_reset_endpoint() is called,
right?
In that case, is it better to check if td_list is not empty and fail the
reset after xhci_stop_endpoint_for_config() rather than before?
Also, while you're going through the whole remove-and-add procedure for
endpoints that aren't halted, do you want to hold the bandwidth mutex?
If the procedure isn't atomic, there's a possibility that some other
device could change configs in the middle. This would also prevent
concurrent altsetting changes.
Yes, i need to lock the bandwidth_mutex. I will fix this.
Alan Stern
Also, regarding the case that there is an attempt to enqueue a URB while
the endpoint is being reset, is it ok to fail the enqueue by adding a
check in prepare_ring() to fail when the endpoint's state is
EP_CONFIG_PENDING ?
regards,
ksenia
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html