Hi Laurent, ccing Michael for comments about returning v4l2 bufs too early. On Mon, Apr 25, 2022 at 01:58:52AM +0300, Laurent Pinchart wrote: > Hi Dan, > > On Wed, Apr 20, 2022 at 04:27:27PM -0500, Dan Vacura wrote: > > On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote: > > > > > > This indeed fixes an issue, so I think we can merge the patch, but I > > > also believe we need further improvements on top (of course if you would > > > like to improve the implementation in a v4, I won't complain :-)) > > > > It looks like Greg has already accepted the change and it's in > > linux-next. We can discuss here how to better handle these -EXDEV errors > > for future improvements, as it seems like it's been an issue in the past > > as well: > > https://www.mail-archive.com/linux-usb@xxxxxxxxxxxxxxx/msg105615.html > > > > > As replied in v2 (sorry for the late reply), it seems that this error > > > can occur under normal conditions. This means we shouldn't cancel the > > > queue, at least when the error is intermitent (if all URBs fail that's > > > another story). > > > > My impression was that canceling the queue was still necessary as we may > > be in progress for the current frame. Perhaps we don't need to flush all > > the frames from the queue, but at a minimum we need to reset the > > buf_used value. > > I think we have three classes of errors: > > - "Packet-level" errors, resulting in either data loss or erroneous data > being transferred to the host for one (or more) packets in a frame. > When such errors occur, we should probably notify the application (on > the gadget side), but we can continue sending the rest of the frame. > > - "Frame-level" errors, resulting in errors in the rest of the frame. > When such an error occurs, we should notify the application, and stop > sending data for the current frame, moving to the next frame. > > - "Stream-level" errors, resulting in errors in all subsequent frames. > When such an error occurs, we should notify the application and stop > sending data until the application takes corrective measures. > > I'm not sure if packet-level errors make sense, if data is lost, maybe > we would be better off just cancelling the current frame and moving to > the next one. > > For both packet-level errors and frame-level errors, the buffer should > be marked as erroneous to notify the application, but there should be no > need to cancel the queue and drop all queued buffers. We can just move > to the next buffer. > > For stream-level errors, I would cancel the queue, and additionally > prevent new buffers from being queued until the application stops and > restarts the stream. > > Finally, which class an error belongs to may not be an intrinsic > property of the error itself, packet-level or frame-level errors that > occur too often may be worth cancelling the queue (I'm not sure how to > quantify "too often" though). > > Does this make sense ? Yes, this makes sense, but I'm not sure if the USB controllers send back that info and/or if it's all standardized. For example in the dwc3 controller I see the following status values returned: -EPIPE, -ECONNRESET, -ESHUTDOWN, and -EXDEV; whereas in the musb controller doesn't appear to return -EXDEV. > > > > We likely need to differentiate between -EXDEV and other errors in > > > uvc_video_complete(), as I'd like to be conservative and cancel the > > > queue for unknown errors. We also need to improve the queue cancellation > > > implementation so that userspace gets an error when queuing further > > > buffers. > > > > We already feedback to userspace the error, via the state of > > vb2_buffer_done(). When userspace dequeues the buffer it can check if > > v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then > > decide what to do like re-queue that frame. However, this appears to not > > always occur since I believe the pump thread is independent of the > > uvc_video_complete() callback. As a result, the complete callback of the > > failed URB may be associated with a buffer that was already released > > back to the userspace client. > > Good point. That would only be the case for errors in the last > request(s) for a frame, right ? >From logging it seems it can be up to the last few requests that come back, where the queue is already empty. I guess this is timing dependent on the hw, the system scheduling, and how deep the request queue is. > > > In this case, I don't know if there's > > anything to be done, since a new buffer and subsequent URBs might > > already be queued up. You suggested an error on a subsequent buffer > > queue, but I don't know how helpful that'd be at this point, perhaps in > > the scenario that all URBs are failing? > > Should we delay sending the buffer back to userspace until all the > requests for the buffer have completed ? Yes, I had that same thought later on. We'll either need a new pending_complete_queue to move the buffer from the current queue, or create logic to leverage the existing uvc_buffer_state flags; like uvcg_queue_head() looks for the first buffer with UVC_BUF_STATE_QUEUED. When the complete call comes in we'll have to identify if the request is the last completed one for that buffer. It looks like the UVC_STREAM_EOF flag will be present, hopefully that's sufficient to rely on. Also, since this is a FIFO queue, we can assume that the first buffer with UVC_BUF_STATE_DONE can be vb2_buffer_done()'d. If we implement this type of logic then we can probably remove this change: https://lore.kernel.org/r/20220402232744.3622565-3-m.grzeschik@xxxxxxxxxxxxxx since its purpose is similar. How does that sound and do you have an opinion on a new queue or creating logic around uvc_buffer_state flags? > > -- > Regards, > > Laurent Pinchart