Hi Alex Thanks for the response. On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote: > > Hi, > > On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson > <dave.stevenson@xxxxxxxxxxxxxxx> wrote: > > > > Hi All. > > > > I'm currently doing battle with the stateful video decoder API for > > video decode on the Raspberry Pi. > > > > Reading the descriptive docs[1] there is no obligation to > > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for > > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It > > makes my colleague who is working on the userspace side happy as it > > saves a config step of allocating buffers that are never needed. > > > > I have been using the v4l2_mem2mem framework, same as some other > > decoders. We use v4l2_m2m in the buffered mode as it's actually > > remoted over to the VPU via the MMAL API, and so the src and dest are > > asynchronous from V4L2's perspective. > > > > Said colleague then complained that he couldn't follow the flow > > described in the docs linked above as it never produced the > > V4L2_EVENT_SOURCE_CHANGE event. > > > > Digging into it, it's the v4l2_mem2mem framework stopping me. > > __v4l2_m2m_try_queue[2] has > > if (!m2m_ctx->out_q_ctx.q.streaming > > || !m2m_ctx->cap_q_ctx.q.streaming) { > > dprintk("Streaming needs to be on for both queues\n"); > > return; > > } > > So I'm never going to get any of the OUTPUT buffers even queued to my > > driver until STREAMON(CAPTURE). That contradicts the documentation :-( > > > > Now I can see that on a non-buffered M2M device you have to have both > > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE > > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec > > tweaking that one clause to > > if (!m2m_ctx->out_q_ctx.buffered && > > (!m2m_ctx->out_q_ctx.q.streaming || > > !m2m_ctx->cap_q_ctx.q.streaming)) { > > solves the problem, but is that a generic solution? I don't have any > > other platforms to test against. > > As you said you cannot rely on the v4l2_m2m_try_schedule() to run the > jobs until both queues are streaming. This is one point where stateful > decoders do not fit with the expectation from M2M that 1 input buffer > == 1 output buffer. How to work around this limitation depends on the > design of the underlying hardware, but for example the mtk-vcodec > driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue > hook: > > https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220 > > This allows the hardware to start processing the stream until it > reports the expected resolution, after which the CAPTURE buffers can > be allocated and the CAPTURE queue started. > > Queueing OUTPUT buffers in the buf_queue hook can be done as a general > rule if the hardware expects input and output buffers to be queued > independently. You can also switch to a more traditional M2M scheme > once both queues are running if this fits the driver better. Is it deliberate that v4l2_m2m_try_schedule enforces both queues being active in buffered mode, or an oversight? I'm happy to switch to a custom qbuf on the OUTPUT queue if we must, but at least for drivers using the buffered mode of v4l2_m2m it seems unnecessary except for this one conditional. > > > > However it poses a larger question for my colleague as to what > > behaviour he can rely on in userspace. Is there a way for userspace to > > know whether it is permitted on a specific codec implementation to > > follow the docs and not STREAMON(CAPTURE) until > > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for > > many devices. > > The documentation should be correct for most if not all stateful > decoders in the tree. I know firsthand that at least mtk-vcodec and > venus are compliant. mtk-vcodec I can see as being compliant now - thanks for your explanation. venus appears to ignore the v4l2_m2m job scheduling side (device_run is empty), has a good rummage in the v4l2_m2m buffer queues from venus_helper_process_initial_out_bufs, and then has a custom vb2_ops buf_queue to queue the buffer with v4l2_m2m and then immediately kick the processing thread. Now knowing where to look, coda appears to have a custom code path in coda_buf_queue to handle the startup phase, and then drops into a more generic path once initialised. v4l2_m2m seems to be so close to doing what is needed for the stateless decoders that it seems odd that it's requiring what looks like bodging to all stateless decoders. I guess it's just the way things have evolved over time. Thanks again. Dave