Hi Nicolas, On Tuesday 03 June 2014 13:39:54 Nicolas Dufresne wrote: > Le mardi 03 juin 2014 à 18:11 +0200, Laurent Pinchart a écrit : > > On Tuesday 03 June 2014 10:37:50 Nicolas Dufresne wrote: > >> Le mardi 03 juin 2014 à 08:38 +0200, Hans Verkuil a écrit : > >>> On 06/02/2014 09:47 PM, Nicolas Dufresne wrote: > >>>> Hi everyone, > >>>> > >>>> Recently in GStreamer we notice that we where not handling the > >>>> POLLERR flag at all. Though we found that what the code do, and what > >>>> the doc says is slightly ambiguous. > >>>> > >>>> "When the application did not call VIDIOC_QBUF or > >>>> VIDIOC_STREAMON yet the poll() function succeeds, but sets > >>>> the POLLERR flag in the revents field." > >>>> > >>>> In our case, we first seen this error with a capture device. How > >>>> things worked is that we first en-queue all allocated buffers. Our > >>>> interpretation was that this would avoid not calling "VIDIOC_QBUF > >>>> [...] yet", and only then we would call VIDIOC_STREAMON. This way, > >>>> in our interpretation we would never get that error. > >>>> > >>>> Though, this is not what the code does. Looking at videobuf2, if > >>>> simply return this error when the queue is empty. > >>>> > >>>> /* > >>>> * There is nothing to wait for if no buffers have already been > >>>> queued. > >>>> */ > >>>> if (list_empty(&q->queued_list)) > >>>> return res | POLLERR; > >>>> > >>>> So basically, we endup in this situation where as soon as all > >>>> existing buffers has been dequeued, we can't rely on the driver to > >>>> wait for a buffer to be queued and then filled again. This basically > >>>> forces us into adding a new user-space mechanism, to wait for buffer > >>>> to come back. We are wandering if this is a bug. If not, maybe it > >>>> would be nice to improve the documentation. > >>> > >>> Just for my understanding: I assume that gstreamer polls in one > >>> process or thread and does the queuing/dequeuing in a different > >>> process/thread, is that correct? > >> > >> Yes, in this particular case (polling with queues/thread downstream), > >> the streaming thread do the polling, and then push the buffers. The > >> buffer reach a queue element, which will queued and return. Polling > >> restart at this point. The queue will later pass it downstream from the > >> next streaming thread, and eventually the buffer will be released. For > >> capture device, QBUF will be called upon release. > >> > >> It is assumed that this call to QBUF should take a finite amount of > >> time. Though, libv4l2 makes this assumption wrong by inter-locking DQBUF > >> and QBUF, clearly a bug, but not strictly related to this thread. Also, > >> as we try and avoid blocking in the DQBUF ioctl, it should not be a > >> problem for us. > >> > >>> If it was all in one process, then it would make no sense polling for > >>> a buffer to become available if you never queued one. > >> > >> Not exactly true, the driver may take some time before the buffer we > >> have queued back is filled and available again. The poll/select FD set > >> also have a control pipe, so we can stop the process at any moment. Not > >> polling would mean blocking on an ioctl() which cannot be canceled. > >> > >> But, without downstream queues (thread), the size of the queue will be > >> negotiated so that the buffer will be released before we go back > >> polling. The queue will never be empty in this case. > >> > >>> That is probably the reasoning behind what poll does today. That said, > >>> I've always thought it was wrong and that it should be replaced by > >>> something like: > >>> > >>> if (!vb2_is_streaming(q)) > >>> return res | POLLERR; > >>> > >>> I.e.: only return an error if we're not streaming. > >> > >> I think it would be easier for user space and closer to what the doc > >> says. > > > > I tend to agree, and I'd like to raise a different but related issue. > > > > I've recently run into a problem with a V4L2 device (OMAP4 ISS if you want > > details) that sometimes crashes during video capture. When this occurs the > > device is rendered completely unusable, and userspace need to stop the > > video stream and close the video device node in order to reset the > > device. That's not ideal, but until I pinpoint the root cause that's what > > we have to live with. > > > > When the OMAP4 ISS driver detects the error it immediately completes all > > queued buffers with the V4L2_BUF_FLAG_ERROR flag set, and returns -EIO > > from all subsequent VIDIOC_QBUF calls. The next few VIDIOC_DQBUF calls > > will return buffers with the V4L2_BUF_FLAG_ERROR flag set, after which the > > next VIDIOC_DQBUF call will block in __vb2_wait_for_done_vb() on > > > > ret = wait_event_interruptible(q->done_wq, > > !list_empty(&q->done_list) || !q->streaming); > > > > as the queue is still streaming and the done list stays empty. > > > > (Disclaimer : I'm using gstreamer 0.10 for this project due to TI shipping > > the OMAP4 H.264 codec for this version only) > > Nod, nothing I can help with. This is a very similar problem to out-of-tree > kernel drivers. We need to teach vendors to upstream in gst-plugins-bad, > otherwise it becomes un-maintain. In this specific case the code depends on the unmaintained TI OMAP4 BSP kernel, so it wouldn't have helped much :-/ > > As gstreamer doesn't handle POLLERR in v4l2src the gst_poll_wait() call in > > gst_v4l2src_grab_frame() will return success, and the function then > > proceeds to call gst_v4l2_buffer_pool_dqbuf() which will block. Trying to > > stop the pipeline at that point just hangs forever on the VIDIOC_DQBUF > > call. > > This is what I'm working on right now, don't expect a fix for 0.10, it has > been un-maintained for 2 years now. I know I'm on my own. Or mostly, there are still very helpful gstreamer developers who I want to thank for helping me (they will know who they are :-)). > For the reference: > > https://bugzilla.gnome.org/show_bug.cgi?id=731015 Thank you. > > This kind of fatal error condition should be detected and reported to the > > application. > > > > If we modified the poll() behaviour to return POLLERR on > > !vb2_is_streaming() instead of list_empty(&q->queued_list) the poll call > > would block and stopping the pipeline would be possible. > > > > We would however still miss a mechanism to detect the fatal error and > > report it to the application. As I'm not too familiar with gstreamer I'd > > appreciate any help I could get to implement this. > > It might not be the appropriate list but oh well ... > > GStreamer abstract polling through GstPoll (reason: special features and > multi-platform). To detect the POLLERR, simply keep the GstPollFD > structure around in the object, and call gst_poll_fd_has_error(poll, > pollfd), you can read errno as usual. If you change the kernel as we > said, this code should never be reached, hence shall be a fatal error > (return GST_FLOW_ERROR and GST_ELEMENT_ERROR so application is notified > and can handle it). > > It would indeed be a good mechanism to trigger fatal run-time error in > my opinion. We would need to document values of errno that make sense I > suppose. The ERROR flag is clearly documented as a mechanism for > recoverable errors. Thanks a lot for the information. I'll give this a try and will post RFC patches to the linux-media list, CC'ing you. > >> Though, it's not just about changing that check, there is some more work > >> involved from what I've seen. > > > > What have you seen ? :-) > > My bad, miss-read the next statement: > > if (list_empty(&q->done_list)) > poll_wait(file, &q->done_wq, wait); > > Nothing complex to do indeed. -- Regards, Laurent Pinchart -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html