Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

Alexandre Courbot <acourbot@xxxxxxxxxxxx> · Wed, 17 Apr 2019 14:39:50 +0900

Hi Paul,

On Tue, Apr 16, 2019 at 4:55 PM Paul Kocialkowski
<paul.kocialkowski@xxxxxxxxxxx> wrote:
>
> Hi,
>
> Le mardi 16 avril 2019 à 16:22 +0900, Alexandre Courbot a écrit :
>
> [...]
>
> > Thanks for this great discussion. Let me try to summarize the status
> > of this thread + the IRC discussion and add my own thoughts:
> >
> > Proper support for multiple decoding units (e.g. H.264 slices) per
> > frame should not be an afterthought ; compliance to encoded formats
> > depend on it, and the benefit of lower latency is a significant
> > consideration for vendors.
> >
> > m2m, which we use for all stateless codecs, has a strong assumption
> > that one OUTPUT buffer consumed results in one CAPTURE buffer being
> > produced. This assumption can however be overruled: at least the venus
> > driver does it to implement the stateful specification.
> >
> > So we need a way to specify frame boundaries when submitting encoded
> > content to the driver. One request should contain a single OUTPUT
> > buffer, containing a single decoding unit, but we need a way to
> > specify whether the driver should directly produce a CAPTURE buffer
> > from this request, or keep using the same CAPTURE buffer with
> > subsequent requests.
> >
> > I can think of 2 ways this can be expressed:
> > 1) We keep the current m2m behavior as the default (a CAPTURE buffer
> > is produced), and add a flag to ask the driver to change that behavior
> > and hold on the CAPTURE buffer and reuse it with the next request(s) ;
>
> That would kind of break the stateless idea. I think we need requests
> to be fully independent of eachother and have some entity that
> coordinates requests for this kind of things.

Side note: the idea that stateless decoders are entirely stateless is
not completely accurate anyway. When we specify a resolution on the
OUTPUT queue, we already store some state. What matters IIUC is that
the *hardware* behaves in a stateless manner. I don't think we should
refrain from storing some internal driver state if it makes sense.

Back to the topic: the effect of this flag would just be that the
first buffer is the CAPTURE queue is not removed, i.e. the next
request will work on the same buffer. It doesn't really preserve any
state - if the next request is the beginning of a different frame,
then the previous work will be discarded and the driver will behave as
it should, not considering any previous state.

>
> > 2) We specify that no CAPTURE buffer is produced by default, unless a
> > flag asking so is specified.
> >
> > The flag could be specified in one of two ways:
> > a) As a new v4l2_buffer.flag for the OUTPUT buffer ;
> > b) As a dedicated control, either format-specific or more common to all codecs.
>
> I think we must aim for a generic solution that would be at least
> common to all codecs, and if possible common to requests regardless of
> whether they concern video decoding or not.
>
> I really like the idea of introducing a requests batch/group/queue,
> which groups requests together and allows marking them done when the
> whole group is done being decoded. For that, we explicitly mark one of
> the requests as the final one, so that we can continue adding requests
> to the batch even when it's already being processed. When all the
> requests are done being decoded, we can mark them done.

I'd need to see this idea more developed (with maybe an example of the
sequence of IOCTLs) to form an opinion about it. Also would need to be
given a few examples of where this could be used outside of stateless
codecs. Then we will have to address what this means for requests:
your argument against using a "release CAPTURE buffer" flag was that
requests won't be fully independent from each other anymore, but I
don't see that situation changing with batches. Then, does the end of
a batch only means that a CAPTURE buffer should be released, or are
other actions required for non-codec use-cases? There are lots and
lots of questions like this one lurking.

>
> With that, we also need some tweaking in the core to look for an
> available capture buffer that matches the output buffer's timestamp
> before trying to dequeue the next available capture buffer

I don't think that would be strictly necessary, unless we want to be
able to decode slices from different frames before the first one is
completed?

> This way,
> the first request of the batch will get any queued capture buffer, but
> subsequent requests will find the matchung capture buffer by timestamp.
>
> I think that's basically all we need to handle that and the two aspects
> (picking by timestamp and requests groups) are rather independent and
> the latter could probably be used in other situations than video
> decoding.
>
> What do you think?

At the current point I'd like to avoid over-engineering things.
Introducing a request batch mechanism would mean more months spent
before we can set the stateless codec API in stone, and at some point
we need to settle and release something that people can use. We don't
even have clear idea of what batches would look like and in which
cases they would be used. The idea of an extra flag is simple and
AFAICT would do the job nicely, so why not proceed with this for the
time being?

Cheers,
Alex.