Re: Stateful Video Decoder interface vs M2M framework

Alexandre Courbot <acourbot@xxxxxxxxxxxx> · Tue, 2 Feb 2021 22:16:59 +0900

Hi Dave,

On Tue, Feb 2, 2021 at 2:09 AM Dave Stevenson
<dave.stevenson@xxxxxxxxxxxxxxx> wrote:
>
> Hi Alex
>
> Thanks for the response.
>
> On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> > <dave.stevenson@xxxxxxxxxxxxxxx> wrote:
> > >
> > > Hi All.
> > >
> > > I'm currently doing battle with the stateful video decoder API for
> > > video decode on the Raspberry Pi.
> > >
> > > Reading the descriptive docs[1] there is no obligation to
> > > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > > makes my colleague who is working on the userspace side happy as it
> > > saves a config step of allocating buffers that are never needed.
> > >
> > > I have been using the v4l2_mem2mem framework, same as some other
> > > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > > remoted over to the VPU via the MMAL API, and so the src and dest are
> > > asynchronous from V4L2's perspective.
> > >
> > > Said colleague then complained that he couldn't follow the flow
> > > described in the docs linked above as it never produced the
> > > V4L2_EVENT_SOURCE_CHANGE event.
> > >
> > > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > > __v4l2_m2m_try_queue[2] has
> > >     if (!m2m_ctx->out_q_ctx.q.streaming
> > >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> > >         dprintk("Streaming needs to be on for both queues\n");
> > >         return;
> > >     }
> > > So I'm never going to get any of the OUTPUT buffers even queued to my
> > > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> > >
> > > Now I can see that on a non-buffered M2M device you have to have both
> > > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > > tweaking that one clause to
> > >     if (!m2m_ctx->out_q_ctx.buffered &&
> > >         (!m2m_ctx->out_q_ctx.q.streaming ||
> > >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > > solves the problem, but is that a generic solution? I don't have any
> > > other platforms to test against.
> >
> > As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> > jobs until both queues are streaming. This is one point where stateful
> > decoders do not fit with the expectation from M2M that 1 input buffer
> > == 1 output buffer. How to work around this limitation depends on the
> > design of the underlying hardware, but for example the mtk-vcodec
> > driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> > hook:
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
> >
> > This allows the hardware to start processing the stream until it
> > reports the expected resolution, after which the CAPTURE buffers can
> > be allocated and the CAPTURE queue started.
> >
> > Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> > rule if the hardware expects input and output buffers to be queued
> > independently. You can also switch to a more traditional M2M scheme
> > once both queues are running if this fits the driver better.
>
> Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
> active in buffered mode, or an oversight?

IIUC that's what the M2M framework expects, historically. It needs
both ends of "memory to memory" in order to do its work, and this
matched well with the first devices using this framework. Stateful
decoders are also "memory to memory", but in the broader sense, both
queues being asynchronous. So while it saves some effort to reuse some
of the M2M helpers, the whole framework cannot be used as it was
originally intended.

>
> I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
> but at least for drivers using the buffered mode of v4l2_m2m it seems
> unnecessary except for this one conditional.
>
> > >
> > > However it poses a larger question for my colleague as to what
> > > behaviour he can rely on in userspace. Is there a way for userspace to
> > > know whether it is permitted on a specific codec implementation to
> > > follow the docs and not STREAMON(CAPTURE) until
> > > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > > many devices.
> >
> > The documentation should be correct for most if not all stateful
> > decoders in the tree. I know firsthand that at least mtk-vcodec and
> > venus are compliant.
>
> mtk-vcodec I can see as being compliant now - thanks for your explanation.
>
> venus appears to ignore the v4l2_m2m job scheduling side (device_run
> is empty), has a good rummage in the v4l2_m2m buffer queues from
> venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
> buf_queue to queue the buffer with v4l2_m2m and then immediately kick
> the processing thread.
>
> Now knowing where to look, coda appears to have a custom code path in
> coda_buf_queue to handle the startup phase, and then drops into a more
> generic path once initialised.
>
> v4l2_m2m seems to be so close to doing what is needed for the
> stateless decoders that it seems odd that it's requiring what looks
> like bodging to all stateless decoders. I guess it's just the way
> things have evolved over time.

Yeah, what we really need here are encoder and stateful/stateless
decoder helpers (that may or may not rely on M2M) that manage the
queues as expected, while also enforcing the specification. Coming
with a design that works for all drivers is not easy though.

Cheers,
Alex.