Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Wed, 8 Aug 2018 12:07:37 +0900

On Wed, Aug 8, 2018 at 4:11 AM Maxime Jourdan <maxi.jourdan@xxxxxxxxxx> wrote:
>
> 2018-08-07 9:13 GMT+02:00 Hans Verkuil <hverkuil@xxxxxxxxx>:
> > On 07/26/2018 12:20 PM, Tomasz Figa wrote:
> >> Hi Hans,
> >>
> >> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> >>>> +
> >>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> >>>> +
> >>>> +Decoding
> >>>> +========
> >>>> +
> >>>> +This state is reached after a successful initialization sequence. In this
> >>>> +state, client queues and dequeues buffers to both queues via
> >>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> >>>> +semantics.
> >>>> +
> >>>> +Both queues operate independently, following standard behavior of V4L2
> >>>> +buffer queues and memory-to-memory devices. In addition, the order of
> >>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> >>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> >>>> +coded format, e.g. frame reordering. The client must not assume any direct
> >>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> >>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
> >>>
> >>> Is there a relationship between capture and output buffers w.r.t. the timestamp
> >>> field? I am not aware that there is one.
> >>
> >> I believe the decoder was expected to copy the timestamp of matching
> >> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
> >> to be implementing it this way. I guess it might be a good idea to
> >> specify this more explicitly.
> >
> > What about an output buffer producing multiple capture buffers? Or the case
> > where the encoded bitstream of a frame starts at one output buffer and ends
> > at another? What happens if you have B frames and the order of the capture
> > buffers is different from the output buffers?
> >
> > In other words, for codecs there is no clear 1-to-1 relationship between an
> > output buffer and a capture buffer. And we never defined what the 'copy timestamp'
> > behavior should be in that case or if it even makes sense.
> >
> > Regards,
> >
> >         Hans
>
> As it is done right now in userspace (FFmpeg, GStreamer) and most (if
> not all?) drivers, it's a 1:1 between OUTPUT and CAPTURE. The only
> thing that changes is the ordering since OUTPUT buffers are in
> decoding order while CAPTURE buffers are in presentation order.

If I understood it correctly, there is a feature in VP9 that lets one
frame repeat several times, which would make one OUTPUT buffer produce
multiple CAPTURE buffers.

Moreover, V4L2_PIX_FMT_H264 is actually defined to be a byte stream,
without any need for framing, and yes, there are drivers that follow
this definition correctly (s5p-mfc and, AFAIR, coda). In that case,
one OUTPUT buffer can have arbitrary amount of bitstream and lead to
multiple CAPTURE frames being produced.

>
> This almost always implies some timestamping kung-fu to match the
> OUTPUT timestamps with the corresponding CAPTURE timestamps. It's
> often done indirectly by the firmware on some platforms (rpi comes to
> mind iirc).

I don't think there is an upstream driver for it, is there? (If not,
are you aware of any work towards it?)

>
> The current constructions also imply one video packet per OUTPUT
> buffer. If a video packet is too big to fit in a buffer, FFmpeg will
> crop that packet to the maximum buffer size and will discard the
> remaining packet data. GStreamer will abort the decoding. This is
> unfortunately one of the shortcomings of having fixed-size buffers.
> And if they were to split the packet in multiple buffers, then some
> drivers in their current state wouldn't be able to handle the
> timestamping issues and/or x:1 OUTPUT:CAPTURE buffer numbers.

In Chromium, we just allocate OUTPUT buffers big enough to be really
unlikely for a single frame not to fit inside [1]. Obviously it's a
waste of memory, for formats which normally have just single frames
inside buffers, but it seems to work in practice.

[1] https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.h?rcl=3468d5a59e00bcb2c2e946a30694e6057fd9ab21&l=118

Best regards,
Tomasz