Re: [PATCH v2 1/2] media: docs-rst: Document memory-to-memory video decoder interface

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Tue, 9 Apr 2019 18:47:30 +0900

On Wed, Feb 6, 2019 at 2:35 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
>
> On Wed, Jan 30, 2019 at 1:02 PM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >
> > Le vendredi 25 janvier 2019 à 12:27 +0900, Tomasz Figa a écrit :
> > > On Fri, Jan 25, 2019 at 4:55 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> > > > Le jeudi 24 janvier 2019 à 18:06 +0900, Tomasz Figa a écrit :
> > > > > > Actually I just realized the last point might not even be achievable
> > > > > > for some of the decoders (s5p-mfc, mtk-vcodec), as they don't report
> > > > > > which frame originates from which bitstream buffer and the driver just
> > > > > > picks the most recently consumed OUTPUT buffer to copy the timestamp
> > > > > > from. (s5p-mfc actually "forgets" to set the timestamp in some cases
> > > > > > too...)
> > > > > >
> > > > > > I need to think a bit more about this.
> > > > >
> > > > > Actually I misread the code. Both s5p-mfc and mtk-vcodec seem to
> > > > > correctly match the buffers.
> > > >
> > > > Ok good, since otherwise it would have been a regression in MFC driver.
> > > > This timestamp passing thing could in theory be made optional though,
> > > > it lives under some COPY_TIMESTAMP kind of flag. What that means though
> > > > is that a driver without such a capability would need to signal dropped
> > > > frames using some other mean.
> > > >
> > > > In userspace, the main use is to match the produced frame against a
> > > > userspace specific list of frames. At least this seems to be the case
> > > > in Gst and Chromium, since the userspace list contains a superset of
> > > > the metadata found in the v4l2_buffer.
> > > >
> > > > Now, using the produced timestamp, userspace can deduce frame that the
> > > > driver should have produced but didn't (could be a deadline case codec,
> > > > or simply the frames where corrupted). It's quite normal for a codec to
> > > > just keep parsing until it finally find something it can decode.
> > > >
> > > > That's at least one way to do it, but there is other possible
> > > > mechanism. The sequence number could be used, or even producing buffers
> > > > with the ERROR flag set. What matters is just to give userspace a way
> > > > to clear these frames, which would simply grow userspace memory usage
> > > > over time.
> > >
> > > Is it just me or we were missing some consistent error handling then?
> > >
> > > I feel like the drivers should definitely return the bitstream buffers
> > > with the ERROR flag, if there is a decode failure of data in the
> > > buffer. Still, that could become more complicated if there is more
> > > than 1 frame in that piece of bitstream, but only 1 frame is corrupted
> > > (or whatever).
> >
> > I agree, but it might be more difficult then it looks (even FFMPEG does
> > not do that). I believe the code that is processing the bitstream in
> > stateful codecs is mostly unrelated from the code actually doing the
> > decoding. So what might happen is that the decoding part will never
> > actually allocate a buffer for the skipped / corrupted part of the
> > bitstream. Also, the notion of a skipped frame is not always evident in
> > when parsing H264 or HEVC NALs. There is still a full page of text just
> > to explain how to detect that start of a new frame.
>
> Right. I don't think we can guarantee that we can always correlate the
> errors with exact buffers and so I phrased the paragraph about errors
> in v3 in a bit more conservative way:
>
> See the snapshot hosted by Hans (thanks!):
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html#decoding
>
> >
> > Yet, it would be interesting to study the firmwares we have and see
> > what they provide that would help making decode errors more explicit.
> >
>
> Agreed.
>
> > >
> > > Another case is when the bitstream, even if corrupted, is still enough
> > > to produce some output. My intuition tells me that such CAPTURE buffer
> > > should be then returned with the ERROR flag. That wouldn't still be
> > > enough for any more sophisticated userspace error concealment, but
> > > could still let the userspace know to perhaps drop the frame.
> >
> > You mean if a frame was concealed (typically the frame was decoded from
> > a closed by reference instead of the expected reference). That is
> > something signalled by FFPEG. We should document this possibility. I
> > actually have something implemented in GStreamer. Basically if we have
> > the ERROR flag with a payload size smaller then expected, I drop the
> > frame and produce a drop event message, while if I have a frame with
> > ERROR flag but of the right payload size, I assume it is corrupted, and
> > simply flag it as corrupted, leaving to the application the decision to
> > display it or not. This is a case that used to happen with some UVC
> > cameras (though some have been fixed, and the UVC camera should drop
> > smaller payload size buffers now).
>
> I think it's a behavior that makes the most sense indeed.
>
> Technically one could also consider the case of 0 < bytesused <
> sizeimage, which could mean that only a part of the frame is in the
> buffer. An application could try to blend it with previous frame using
> some concealing algorithms. I haven't seen an app that could do such
> thing, though.

Actually some interesting thought on this. I don't think the existing
drivers would return any CAPTURE buffers on errors right now, but just
return the OUTPUT buffer that failed to be decoded. Should we change
this so that always one CAPTURE buffer with the ERROR flag is
returned, to signal the application that a frame was potentially
dropped?

Best regards,
Tomasz