Le mardi 03 juin 2014 à 18:11 +0200, Laurent Pinchart a écrit : > Hi Nicolas, > > On Tuesday 03 June 2014 10:37:50 Nicolas Dufresne wrote: > > Le mardi 03 juin 2014 à 08:38 +0200, Hans Verkuil a écrit : > > > On 06/02/2014 09:47 PM, Nicolas Dufresne wrote: > > > > Hi everyone, > > > > > > > > Recently in GStreamer we notice that we where not handling the POLLERR > > > > flag at all. Though we found that what the code do, and what the doc > > > > says is slightly ambiguous. > > > > > > > > "When the application did not call VIDIOC_QBUF or > > > > VIDIOC_STREAMON yet the poll() function succeeds, but sets the > > > > POLLERR flag in the revents field." > > > > > > > > In our case, we first seen this error with a capture device. How things > > > > worked is that we first en-queue all allocated buffers. Our > > > > interpretation was that this would avoid not calling "VIDIOC_QBUF [...] > > > > yet", and only then we would call VIDIOC_STREAMON. This way, in our > > > > interpretation we would never get that error. > > > > > > > > Though, this is not what the code does. Looking at videobuf2, if simply > > > > return this error when the queue is empty. > > > > > > > > /* > > > > * There is nothing to wait for if no buffers have already been queued. > > > > */ > > > > if (list_empty(&q->queued_list)) > > > > return res | POLLERR; > > > > > > > > So basically, we endup in this situation where as soon as all existing > > > > buffers has been dequeued, we can't rely on the driver to wait for a > > > > buffer to be queued and then filled again. This basically forces us into > > > > adding a new user-space mechanism, to wait for buffer to come back. We > > > > are wandering if this is a bug. If not, maybe it would be nice to > > > > improve the documentation. > > > > > > Just for my understanding: I assume that gstreamer polls in one process or > > > thread and does the queuing/dequeuing in a different process/thread, is > > > that correct? > > > > Yes, in this particular case (polling with queues/thread downstream), > > the streaming thread do the polling, and then push the buffers. The > > buffer reach a queue element, which will queued and return. Polling > > restart at this point. The queue will later pass it downstream from the > > next streaming thread, and eventually the buffer will be released. For > > capture device, QBUF will be called upon release. > > > > It is assumed that this call to QBUF should take a finite amount of > > time. Though, libv4l2 makes this assumption wrong by inter-locking DQBUF > > and QBUF, clearly a bug, but not strictly related to this thread. Also, > > as we try and avoid blocking in the DQBUF ioctl, it should not be a > > problem for us. > > > > > If it was all in one process, then it would make no sense polling for a > > > buffer to become available if you never queued one. > > > > Not exactly true, the driver may take some time before the buffer we > > have queued back is filled and available again. The poll/select FD set > > also have a control pipe, so we can stop the process at any moment. Not > > polling would mean blocking on an ioctl() which cannot be canceled. > > > > But, without downstream queues (thread), the size of the queue will be > > negotiated so that the buffer will be released before we go back > > polling. The queue will never be empty in this case. > > > > > That is probably the reasoning behind what poll does today. That said, > > > I've always thought it was wrong and that it should be replaced by > > > something like: > > > > > > if (!vb2_is_streaming(q)) > > > return res | POLLERR; > > > > > > I.e.: only return an error if we're not streaming. > > > > I think it would be easier for user space and closer to what the doc says. > > I tend to agree, and I'd like to raise a different but related issue. > > I've recently run into a problem with a V4L2 device (OMAP4 ISS if you want > details) that sometimes crashes during video capture. When this occurs the > device is rendered completely unusable, and userspace need to stop the video > stream and close the video device node in order to reset the device. That's > not ideal, but until I pinpoint the root cause that's what we have to live > with. > > When the OMAP4 ISS driver detects the error it immediately completes all > queued buffers with the V4L2_BUF_FLAG_ERROR flag set, and returns -EIO from > all subsequent VIDIOC_QBUF calls. The next few VIDIOC_DQBUF calls will return > buffers with the V4L2_BUF_FLAG_ERROR flag set, after which the next > VIDIOC_DQBUF call will block in __vb2_wait_for_done_vb() on > > ret = wait_event_interruptible(q->done_wq, > !list_empty(&q->done_list) || !q->streaming); > > as the queue is still streaming and the done list stays empty. > > (Disclaimer : I'm using gstreamer 0.10 for this project due to TI shipping the > OMAP4 H.264 codec for this version only) Nod, nothing I can help with. This is a very similar problem to out-of-tree kernel drivers. We need to teach vendors to upstream in gst-plugins-bad, otherwise it becomes un-maintain. > > As gstreamer doesn't handle POLLERR in v4l2src the gst_poll_wait() call in > gst_v4l2src_grab_frame() will return success, and the function then proceeds > to call gst_v4l2_buffer_pool_dqbuf() which will block. Trying to stop the > pipeline at that point just hangs forever on the VIDIOC_DQBUF call. This is what I'm working on right now, don't expect a fix for 0.10, it has been un-maintained for 2 years now. For the reference: https://bugzilla.gnome.org/show_bug.cgi?id=731015 > > This kind of fatal error condition should be detected and reported to the > application. > > If we modified the poll() behaviour to return POLLERR on !vb2_is_streaming() > instead of list_empty(&q->queued_list) the poll call would block and stopping > the pipeline would be possible. > > We would however still miss a mechanism to detect the fatal error and report > it to the application. As I'm not too familiar with gstreamer I'd appreciate > any help I could get to implement this. It might not be the appropriate list but oh well ... GStreamer abstract polling through GstPoll (reason: special features and multi-platform). To detect the POLLERR, simply keep the GstPollFD structure around in the object, and call gst_poll_fd_has_error(poll, pollfd), you can read errno as usual. If you change the kernel as we said, this code should never be reached, hence shall be a fatal error (return GST_FLOW_ERROR and GST_ELEMENT_ERROR so application is notified and can handle it). It would indeed be a good mechanism to trigger fatal run-time error in my opinion. We would need to document values of errno that make sense I suppose. The ERROR flag is clearly documented as a mechanism for recoverable errors. > > > Though, it's not just about changing that check, there is some more work > > involved from what I've seen. > > What have you seen ? :-) > My bad, miss-read the next statement: if (list_empty(&q->done_list)) poll_wait(file, &q->done_wq, wait); Nothing complex to do indeed.
Attachment:
signature.asc
Description: This is a digitally signed message part