Hi, Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> writes: >> >>>> Furthermore, we have found that USB_GADGET_DELAYED_STATUS is racey, >> >>>> which has already been observed in the UVC gadget driver previously >> >>>> [0]. The raceiness stems from the fact that things can happen in >> >>>> between returning USB_GADGET_DELAYED_STATUS and the composite layer >> >>>> reacting to it - especially if usb_composite_setup_continue is called >> >>>> within that window it causes a WARN. In any case, the fact that the >> >>>> mechanism itself is racey suggests that it needs improvement, and using >> >>>> it wouldn't be a good solution in this case. >> >> I don't understand this at all. The composite layer reacts to >> USB_GADGET_DELAYED_STATUS as soon as it receives the return value. Can >> Paul or Laurent give a more explicit example of this race? > > The composite layer only handles USB_GADGET_DELAYED_STATUS for > USB_REQ_SET_CONFIGURATION (in set_config()) and for USB_REQ_SET_INTERFACE (in > composite_setup()). It increments cdev->delayed_status immediately. Then, in > usb_composite_setup_continue(), if cdev->delayed_status is not zero, it queues > a ZLP, and warns otherwise. > > This mechanism delays the data stage, not the status stage (or, to be precise, > it delays the status stage insofar as the status stage comes after the data > stage), and only supports control OUT requests with 0 bytes of data (which is > the case of both USB_REQ_SET_INTERFACE and USB_REQ_SET_CONFIGURATION). For all > other requests, the composite layer passes USB_GADGET_DELAYED_STATUS to the > UDC. DATA stage always depends on a usb_ep_queue() from gadget driver. So it's always "delayed" in that sense. > The three UDCs that implement USB_GADGET_DELAYED_STATUS support set a > delayed_status flag in an internal structure. I haven't inspected in details > what they do next as I'm not familiar with all of them, but the dwc3 driver > just skips the handling of the status phase in dwc3_ep0_xfernotready() and > delays it to __dwc3_gadget_ep0_queue(). This only works for 0-length requests, > with no data phase. data stage always depends on usb_ep_queue(). There was never any need for UDC to handle Data stage internally. Also, status stage is always a ZLP. > Even when limited to 0-length control OUT requests, this mechanism is racy. > The setup handler, when it wants to delay the status phase, will queue > asynchronous work that will, when it completes, call > usb_composite_setup_continue() to proceed with the status phase. Queuing the > work has to be done before the setup handler returns, and the cdev- >>delayed_status is only incremented after the setup handler returns, in > composite_setup(). There is thus a time window during which the asynchronous > work can call usb_composite_setup_continue() before cdev->delayed_status has > been incremented. We have managed to hit this in practice, with a surprisingly > high rate seeing how small the window is. that's only the case because we have two different "modes" for this. One where UDC handles it internally and another where gadget driver has to queue a request. I'm vouching for making status stage always explicit, i.e. we should always expect a usb_ep_queue(). > Now that I've written all this, I realize that cdev->delayed_status is guarded > by cdev->lock. I thus wonder whether our analysis was correct, or if we were > hitting a different bug :-S Paul, could you test this again ? Please note, > however, that the race described here is not related to this patch series, > except in how it influences the API design to avoid race conditions. > >> Assuming you are correct, wouldn't it make sense to fix or eliminate >> the race by changing composite.c? > > I was about to write that we would need to lock access to cdev- >>delayed_status, and found out that we already use cdev->lock to do so. More > investigations are needed. > > Please note, however, that USB_GADGET_DELAYED_STATUS is limited to 0-length > control OUT requests, so the problem that led to this patch series still > exists, even if the race condition I thought was there doesn't exist. And that problem is...? DATA stage always depends on a usb_ep_queue() from gadget driver. >> >> If this works with dwc3 + uvc, then we have a good recipe on how to >> >> implement for the other drivers. >> > >> > Given that we need to delay the status stage and not the data stage, we >> > can't explicitly request the status stage through a usb request queue. >> >> Why not? The status stage for a control-OUT transfer is simply a >> zero-length IN transaction. It's easy to queue a request for such a >> transaction. Is the issue that there's no way to specify the direction >> of the request (hence no direct way to tell whether a zero-length >> request is for the data stage or the status stage)? > > OK, I suppose we could queue a request for this, in which case we would have > to queue two requests for control OUT transfers (one for the data stage and > one for the status stage). I'm however not convinced that would be the best that's correct. This is what "make status stage always explicit" mean :) > API to handle the status stage, as the function driver would need to queue a it avoids all the special cases. UDC drivers can implement a single handling for struct usb_request. We could do away with special return values and so on... > request and the UDC would then need to check whether that request corresponds > to a status stage and process it accordingly. A new operation specific to this no, it wouldn't. UDC would have to check the size of request, that's all: if (r->length == 0) special_zlp_handling(); else regular_non_zlp_handling(); But we don't need to care about special return values and the like. We don't even need to care (from UDC perspective) if we're dealing with 2-stage or 3-stage control transfers (well, dwc3 needs to care because of different TRB types that needs to be used, but that's another story) > would be easier for both the function driver and the UDC in my opinion. it wouldn't. We would just be moving the special case to another function, rather than eliminating it. > There's also the fact that requests can specify a completion handler, but only > the data stage request would see its completion handler called (unless we > require UDCs to call completion requests at the completion of the status > stage, but I'm not sure that all UDCs can report the event to the driver, and > that would likely be useless as nobody needs that feature). you still wanna know if the host actually processed your status stage. udc-core can (and should) provide a generic status stage completion function which, at a minimum, aids with some tracepoints. One way to satisfy what you want, with what I want is to have UDC core implement something like below: int usb_ep_start_status_stage(struct usb_gadget *g) { return usb_ep_queue(g->ep0, &g->ep0_status_request); } special function for you, usb_ep_queue() for me :-p >> Admittedly, it might be nice to provide a library routine in the UDC >> core to queue such requests, since it involves a bunch of uninteresting >> boilerplate operations. >> >> > Would a new explicit function call work for you for that purpose ? >> >> It would be okay, but I question whether one is really needed. > > I think the API would be cleaner, but it might just be a matter of taste. From a UDC perspective, I'm more inclined to removing special cases, rather than making them more apparent. Having a single method for handling all three stages of a control transfer, IMO, is far more beneficial as it removes magic return values and several branches on UDC driver. >> >>> - The mechanism is poorly documented. As Paul mentioned, comments in >> >>> the code state that USB_GADGET_DELAYED_STATUS delay the "data/status >> >>> stages". This is very unclear, and the only three UDCs that implement >> >>> the mechanism seem to do so in different ways: >> >> We can fix comments and documentation pretty easily. :-) > > It's harder to fix them if different implementations interpret them in > different ways :-) That might not be the case though, as mentioned above I that's where a proper audit of the code comes into play :) > haven't studied the three UDCs that implement this in details, I only had a > look at the dwc3. that's perfectly fine. We can Cc other folks involved with the other UDCs and have them chip in. >> This requires the UDC to specifically keep track of the direction of >> the current transfer and whether or not a data-stage transfer has >> already been queued. That shouldn't be hard. > > It's "just" a state machine so it wouldn't be too hard. What we need to agree > on is how the state machine operates, and then the API to control it. That's > what I tried to describe below in my previous e-mail. > >> (But it does involve a >> race in cases where the host gets tired of waiting and issues another >> SETUP packet before the processing of the first transfer is finished.) Host would stall first in that case. Driver is already required to handle stalls for several other conditions. If thehre are bugs in that area, I'd prefer catching them. >> > I wonder if there's really a use case for delaying the data stage of >> > control OUT requests, as it seems to me that we can perform the >> > asynchronous validation of the setup and data stages together, in which >> > case we would always proceed to the data stage, and only potentially >> > delay the status stage. However, if we switch to an explicit API where >> > the transition from the setup to the data stage is triggered by queueing >> > a request, and given that such a transition may need to be delayed for >> > the control IN case, delaying the data stage for control OUT would >> > essentially come for free. > > What do you think about this ? Should we allow function drivers to delay the > data stage of control OUT requests ? it's already delayed. UDC won't start data stage unless it has a buffer to put the data. Without a usb_ep_queue(), UDC doesn't have a buffer. >> > If we end up moving to explicit state handling, with the data stage being >> > entered by queueing a request, and the status stage being entered by >> > calling a new function, control OUT requests with 0 bytes of data that >> > can be handled synchronously in the setup handler would require function >> > drivers to both queue a zero-length request and call the status function. >> > This would make the function code more complex, and I wonder whether a >> > shortcut would be a good idea, perhaps in the form of a flag in the >> > request that tells the UDC to automatically proceed to the status stage >> > immediately after the data stage. Or we could make that behaviour the >> > default when the request doesn't have a completion handler (as moving >> > explicitly to the status stage should be done at the earliest from the >> > data stage completion handler). > > From an API point of view, towards function drivers, I really want an explicit > function to proceed with the status stage. That could internally queue a ZLP > request or call another API, but in any case I don't want the status stage ZLP > request to be visible to the function drivers. Do you agree with this ? why can't it be visible? I don't mind having a function wrapping usb_ep_queue(), but why is it bad to have functions call usb_ep_queue() directly? > To simplify function drivers, do you think the above proposal of adding a flag > to the (data stage) request to request an automatic transition to the status > stage is a good idea ? We could even possibly invert the logic and transition no, I don't think so. Making the status phase always explicit is far better. UDCs won't have to check flags, or act on magic return values. It just won't do anything until a request is queued. -- balbi
Attachment:
signature.asc
Description: PGP signature