Re: [RFC PATCH v3] media: docs-rst: Document m2m stateless video decoder interface

Alexandre Courbot <acourbot@xxxxxxxxxxxx> · Mon, 22 Oct 2018 16:26:46 +0900

On Mon, Oct 22, 2018 at 3:51 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
>
> On Mon, Oct 22, 2018 at 3:39 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
> >
> > On Mon, Oct 22, 2018 at 3:22 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 22, 2018 at 3:05 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Oct 19, 2018 at 5:44 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > > >
> > > > > On 10/19/18 10:09, Alexandre Courbot wrote:
> > > > > > Thanks everyone for the feedback on v2! I have not replied to all the
> > > > > > individual emails but hope this v3 will address some of the problems
> > > > > > raised and become a continuation point for the topics still in
> > > > > > discussion (probably during the ELCE Media Summit).
> > > > > >
> > > > > > This patch documents the protocol that user-space should follow when
> > > > > > communicating with stateless video decoders. It is based on the
> > > > > > following references:
> > > > > >
> > > > > > * The current protocol used by Chromium (converted from config store to
> > > > > >   request API)
> > > > > >
> > > > > > * The submitted Cedrus VPU driver
> > > > > >
> > > > > > As such, some things may not be entirely consistent with the current
> > > > > > state of drivers, so it would be great if all stakeholders could point
> > > > > > out these inconsistencies. :)
> > > > > >
> > > > > > This patch is supposed to be applied on top of the Request API V18 as
> > > > > > well as the memory-to-memory video decoder interface series by Tomasz
> > > > > > Figa.
> > > > > >
> > > > > > Changes since v2:
> > > > > >
> > > > > > * Specify that the frame header controls should be set prior to
> > > > > >   enumerating the CAPTURE queue, instead of the profile which as Paul
> > > > > >   and Tomasz pointed out is not enough to know which raw formats will be
> > > > > >   usable.
> > > > > > * Change V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM to
> > > > > >   V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS.
> > > > > > * Various rewording and rephrasing
> > > > > >
> > > > > > Two points being currently discussed have not been changed in this
> > > > > > revision due to lack of better idea. Of course this is open to change:
> > > > > >
> > > > > > * The restriction of having to send full frames for each input buffer is
> > > > > >   kept as-is. As Hans pointed, we currently have a hard limit of 32
> > > > > >   buffers per queue, and it may be non-trivial to lift. Also some codecs
> > > > > >   (at least Venus AFAIK) do have this restriction in hardware, so unless
> > > > > >   we want to do some buffer-rearranging in-kernel, it is probably better
> > > > > >   to keep the default behavior as-is. Finally, relaxing the rule should
> > > > > >   be easy enough if we add one extra control to query whether the
> > > > > >   hardware can work with slice units, as opposed to frame units.
> > > > >
> > > > > Makes sense, as long as the restriction can be lifted in the future.
> > > >
> > > > Lifting this limitation once we support more than 32 buffers should
> > > > not be an issue. Just add a new capability control and process things
> > > > in slice units. Right now we have hardware that can only work with
> > > > whole frames (venus)
> > >
> > > Note that venus is a stateful hardware and the restriction might just
> > > come from the firmware.
> >
> > Right, and it most certainly does indeed. Yet firmwares are not always
> > easy to get updated by vendors, so we may have to deal with it anyway.
> >
>
> Right. I'm just not convinced that venus is relevant to the stateless interface.
>
> I'd use the Rockchip VPU hardware as an example of a hardware that
> seems to require all the slices in one buffer, tightly packed one
> after another, since it only accepts one address and size into its
> registers.

You're right that Venus is not relevant here. Rockchip VPU is a much
better example.

>
> > >
> > > > but I suspect that some slice-only hardware must
> > > > exist, so it may actually become a necessity at some point (lest
> > > > drivers do some splitting themselves).
> > > >
> > >
> > > The drivers could do it trivially, because the UAPI will include the
> > > array of slices, with offsets and sizes. It would just run the same
> > > OUTPUT buffer multiple time, once for each slice.
> >
> > Alignment issues notwithstanding. :)
>
> Good point. That could be handled by a memcpy() into a bounce buffer,
> but it wouldn't be as trivial as I suggested anymore indeed.

But at least the kernel won't have to try and parse the stream since
the slices' limits are clearly defined by user-space, so it's not that
big a deal.