Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> · Tue, 16 Apr 2019 09:55:49 +0200

Hi,

Le mardi 16 avril 2019 à 16:22 +0900, Alexandre Courbot a écrit :

[...]

> Thanks for this great discussion. Let me try to summarize the status
> of this thread + the IRC discussion and add my own thoughts:
> 
> Proper support for multiple decoding units (e.g. H.264 slices) per
> frame should not be an afterthought ; compliance to encoded formats
> depend on it, and the benefit of lower latency is a significant
> consideration for vendors.
>
> m2m, which we use for all stateless codecs, has a strong assumption
> that one OUTPUT buffer consumed results in one CAPTURE buffer being
> produced. This assumption can however be overruled: at least the venus
> driver does it to implement the stateful specification.
> 
> So we need a way to specify frame boundaries when submitting encoded
> content to the driver. One request should contain a single OUTPUT
> buffer, containing a single decoding unit, but we need a way to
> specify whether the driver should directly produce a CAPTURE buffer
> from this request, or keep using the same CAPTURE buffer with
> subsequent requests.
> 
> I can think of 2 ways this can be expressed:
> 1) We keep the current m2m behavior as the default (a CAPTURE buffer
> is produced), and add a flag to ask the driver to change that behavior
> and hold on the CAPTURE buffer and reuse it with the next request(s) ;

That would kind of break the stateless idea. I think we need requests
to be fully independent of eachother and have some entity that
coordinates requests for this kind of things.

> 2) We specify that no CAPTURE buffer is produced by default, unless a
> flag asking so is specified.
> 
> The flag could be specified in one of two ways:
> a) As a new v4l2_buffer.flag for the OUTPUT buffer ;
> b) As a dedicated control, either format-specific or more common to all codecs.

I think we must aim for a generic solution that would be at least
common to all codecs, and if possible common to requests regardless of
whether they concern video decoding or not.

I really like the idea of introducing a requests batch/group/queue,
which groups requests together and allows marking them done when the
whole group is done being decoded. For that, we explicitly mark one of
the requests as the final one, so that we can continue adding requests
to the batch even when it's already being processed. When all the
requests are done being decoded, we can mark them done.

With that, we also need some tweaking in the core to look for an
available capture buffer that matches the output buffer's timestamp
before trying to dequeue the next available capture buffer. This way,
the first request of the batch will get any queued capture buffer, but
subsequent requests will find the matchung capture buffer by timestamp.

I think that's basically all we need to handle that and the two aspects
(picking by timestamp and requests groups) are rather independent and
the latter could probably be used in other situations than video
decoding.

What do you think?

Cheers,

Paul

> I tend to favor 2) and b) for this, for the reason that with H.264 at
> least, user-space does not know whether a slice is the last slice of a
> frame until it starts parsing the next one, and we don't know when we
> will receive it. If we use a control to ask that a CAPTURE buffer be
> produced, we can always submit another request with only that control
> set once it is clear that the frame is complete (and not delay
> decoding meanwhile). In practice I am not that familiar with
> latency-sensitive streaming ; maybe a smart streamer would just append
> an AUD NAL unit at the end of every frame and we can thus submit the
> flag it with the last slice without further delay?
> 
> An extra constraint to enforce would be that each decoding unit
> belonging to the same frame must be submitted with the same timestamp,
> otherwise the request submission would fail. We really need a
> framework to enforce all this at a higher level than individual
> drivers, once we reach an agreement I will start working on this.
> 
> Formats that do not support multiple decoding units per frame would
> reject any request that does not carry the end-of-frame information.
> 
> Anything missing / any further comment?
-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com