Re: [PATCHv2 1/2] media: docs-rst: Document memory-to-memory video encoder interface

Michael Tretter <m.tretter@xxxxxxxxxxxxxx> · Thu, 28 May 2020 09:44:28 +0200

On Tue, May 26, 2020 at 06:01:35PM +0200, Tomasz Figa wrote:
> On Tue, May 26, 2020 at 10:32 AM Hans Verkuil <hverkuil-cisco@xxxxxxxxx> wrote:
> >
> > On 20/05/2020 22:32, Nicolas Dufresne wrote:
> > > Le mercredi 20 mai 2020 à 12:01 +0200, Hans Verkuil a écrit :
> > >> From: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > >>
> > >> Due to complexity of the video encoding process, the V4L2 drivers of
> > >> stateful encoder hardware require specific sequences of V4L2 API calls
> > >> to be followed. These include capability enumeration, initialization,
> > >> encoding, encode parameters change, drain and reset.
> > >>
> > >> Specifics of the above have been discussed during Media Workshops at
> > >> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > >> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > >> originated at those events was later implemented by the drivers we already
> > >> have merged in mainline, such as s5p-mfc or coda.
> > >>
> > >> The only thing missing was the real specification included as a part of
> > >> Linux Media documentation. Fix it now and document the encoder part of
> > >> the Codec API.
> > >>
> > >> Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > >> Signed-off-by: Hans Verkuil <hverkuil-cisco@xxxxxxxxx>
> > >> ---
> > >>  .../userspace-api/media/v4l/dev-encoder.rst   | 727 ++++++++++++++++++
> > >>  .../userspace-api/media/v4l/dev-mem2mem.rst   |   1 +
> > >>  .../userspace-api/media/v4l/pixfmt-v4l2.rst   |   5 +
> > >>  .../userspace-api/media/v4l/v4l2.rst          |   2 +
> > >>  .../media/v4l/vidioc-encoder-cmd.rst          |  51 +-
> > >>  5 files changed, 766 insertions(+), 20 deletions(-)
> > >>  create mode 100644 Documentation/userspace-api/media/v4l/dev-encoder.rst
> > >>
> > >> diff --git a/Documentation/userspace-api/media/v4l/dev-encoder.rst b/Documentation/userspace-api/media/v4l/dev-encoder.rst
> >
> > <snip>
> >
> > >> +5. **Optional** Set the coded frame interval on the ``CAPTURE`` queue via
> > >> +   :c:func:`VIDIOC_S_PARM`. This is only necessary if the coded frame
> > >> +   interval is different from the raw frame interval, which is typically
> > >> +   the case for off-line encoding.
> > >> +
> > >> +   * ** Required fields:**
> > >> +
> > >> +     ``type``
> > >> +     a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
> > >> +
> > >> +     ``parm.capture``
> > >> +     set all fields except ``parm.capture.timeperframe`` to 0.
> > >> +
> > >> +     ``parm.capture.timeperframe``
> > >> +     the desired coded frame interval; the encoder may adjust it to
> > >> +     match hardware requirements.
> > >> +
> > >> +   * **Return fields:**
> > >> +
> > >> +     ``parm.capture.timeperframe``
> > >> +     the adjusted frame interval.
> > >> +
> > >> +   .. important::
> > >> +
> > >> +      Changing the ``CAPTURE`` frame interval sets the framerate for the
> > >> +      coded video. It does *not* set the rate at which buffers arrive on the
> > >> +      ``CAPTURE`` queue, that depends on how fast the encoder is and how
> > >> +      fast raw frames are queued on the ``OUTPUT`` queue.
> > >> +
> > >> +   .. important::
> > >> +
> > >> +      ``timeperframe`` deals with *frames*, not fields. So for interlaced
> > >> +      formats this is the time per two fields, since a frame consists of
> > >> +      a top and a bottom field.
> > >> +
> > >> +   .. note::
> > >> +
> > >> +      Not all drivers support this functionality, in that case just set
> > >> +      the desired coded frame interval for the ``OUTPUT`` queue.
> > >
> > > There is a slight contorsion in the resulting user-space API. When I
> > > read this, the logical thing to do for live streams would be to just
> > > set the OUTPUT and the driver will take care of CAPTURE for me.
> > >
> > > While if I want to do offline, I don't know if this is supported or
> > > not. So the flow would be a bit special:
> > >
> > >   S_PARM(OUTPUT) with coded video frame rate
> > >   S_PARM(CAPTURE) width coded video
> > >   if ^ worked:
> > >      S_PARM(OUTPUT) with fastest rate possible
> > >
> > > Ideally I would have preferred if there was a more straight forward way
> > > to configure offline encoding for fastest performance with specific
> > > coded framerate. I don't think it's a blocker though, performance is
> > > not critical at all here. Maybe I'm missing something that allow to
> > > check if this is supported or not without trying it ?
> >
> > Good point. I considered adding a flag for the v4l2_fmtdesc struct that
> > reports whether you can set the capture framerate independently from the
> > OUTPUT framerate. Perhaps V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL?
> >
> > I actually think it would be best if that's added. It is not enough to
> > rely on whether S_PARM(CAPTURE) works to determine this feature since
> > at least one encoder drivers supports both OUTPUT and CAPTURE with S_PARM,
> > but CAPTURE does the same as OUTPUT, so that would be a red herring.
> >
> > I'll add this flag for v3.
> 
> Now if I think of it again, this behavior might be problematic for
> existing userspace, which I believe assumes encoding as fast as
> possible, because of how mem2mem devices were expected to behave.
> Could we define this to always default to as fast as possible?
> Otherwise we could end up regressing the encoding latency on latency
> sensitive applications, such as video conferencing.

I see your point regarding the latency, but I don't think that this justifies
"as fast as possible" as a default in the API. Adjusting the encoding speed
based on a user space hint is usually a trade-off that the driver must make.
If the trade-off results in a reduced latency, it is a regression in the
individual driver and the trade-offs must be discussed per driver.

> 
> For example, we could define the rates of both queues the other way
> around. The OUTPUT queue, which seems to be where the existing user
> space currently sets the desired stream frame rate, could be
> interpreted as the frame rate of the video to be encoded, while the
> CAPTURE queue would control the desired output rate of the encoded
> video, which would kind of match the V4L2 definition of CAPTURE frame
> interval.

Setting the desired stream frame rate on CAPTURE does not work with frame
reordering. In that case driver returns the CAPTURE buffers, whenever a coded
buffer is finished, and not necessarily at a certain frame rate. Strictly
following the V4L2 definition for CAPTURE frame interval, the driver would
need to produce repeated frames if the decode order differs from the display
order.

Setting the stream frame rate on the OUTPUT makes a lot of more sense to me,
because that is the rate at which we can feed buffers to the encoder. This
rate differs, if we use another V4L2 CAPTURE device as a live source or a file
as an offline source for the frames to be encoded, and determines the maximum
rate at which the encoder can produce CAPTURE buffers.

Michael