Re: Stateful Video Decoder interface vs M2M framework

Dave Stevenson <dave.stevenson@xxxxxxxxxxxxxxx> · Mon, 1 Feb 2021 17:09:29 +0000

Hi Alex

Thanks for the response.

On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> <dave.stevenson@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi All.
> >
> > I'm currently doing battle with the stateful video decoder API for
> > video decode on the Raspberry Pi.
> >
> > Reading the descriptive docs[1] there is no obligation to
> > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > makes my colleague who is working on the userspace side happy as it
> > saves a config step of allocating buffers that are never needed.
> >
> > I have been using the v4l2_mem2mem framework, same as some other
> > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > remoted over to the VPU via the MMAL API, and so the src and dest are
> > asynchronous from V4L2's perspective.
> >
> > Said colleague then complained that he couldn't follow the flow
> > described in the docs linked above as it never produced the
> > V4L2_EVENT_SOURCE_CHANGE event.
> >
> > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > __v4l2_m2m_try_queue[2] has
> >     if (!m2m_ctx->out_q_ctx.q.streaming
> >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> >         dprintk("Streaming needs to be on for both queues\n");
> >         return;
> >     }
> > So I'm never going to get any of the OUTPUT buffers even queued to my
> > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> >
> > Now I can see that on a non-buffered M2M device you have to have both
> > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > tweaking that one clause to
> >     if (!m2m_ctx->out_q_ctx.buffered &&
> >         (!m2m_ctx->out_q_ctx.q.streaming ||
> >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > solves the problem, but is that a generic solution? I don't have any
> > other platforms to test against.
>
> As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> jobs until both queues are streaming. This is one point where stateful
> decoders do not fit with the expectation from M2M that 1 input buffer
> == 1 output buffer. How to work around this limitation depends on the
> design of the underlying hardware, but for example the mtk-vcodec
> driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> hook:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
>
> This allows the hardware to start processing the stream until it
> reports the expected resolution, after which the CAPTURE buffers can
> be allocated and the CAPTURE queue started.
>
> Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> rule if the hardware expects input and output buffers to be queued
> independently. You can also switch to a more traditional M2M scheme
> once both queues are running if this fits the driver better.

Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
active in buffered mode, or an oversight?

I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
but at least for drivers using the buffered mode of v4l2_m2m it seems
unnecessary except for this one conditional.

> >
> > However it poses a larger question for my colleague as to what
> > behaviour he can rely on in userspace. Is there a way for userspace to
> > know whether it is permitted on a specific codec implementation to
> > follow the docs and not STREAMON(CAPTURE) until
> > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > many devices.
>
> The documentation should be correct for most if not all stateful
> decoders in the tree. I know firsthand that at least mtk-vcodec and
> venus are compliant.

mtk-vcodec I can see as being compliant now - thanks for your explanation.

venus appears to ignore the v4l2_m2m job scheduling side (device_run
is empty), has a good rummage in the v4l2_m2m buffer queues from
venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
buf_queue to queue the buffer with v4l2_m2m and then immediately kick
the processing thread.

Now knowing where to look, coda appears to have a custom code path in
coda_buf_queue to handle the startup phase, and then drops into a more
generic path once initialised.

v4l2_m2m seems to be so close to doing what is needed for the
stateless decoders that it seems odd that it's requiring what looks
like bodging to all stateless decoders. I guess it's just the way
things have evolved over time.

Thanks again.
  Dave