Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

Nicolas Dufresne <nicolas@xxxxxxxxxxxx> · Sat, 27 Apr 2019 08:06:42 -0400

Le vendredi 26 avril 2019 à 16:18 +0200, Hans Verkuil a écrit :
> On 4/16/19 9:22 AM, Alexandre Courbot wrote:
> 
> <snip>
> 
> > Thanks for this great discussion. Let me try to summarize the status
> > of this thread + the IRC discussion and add my own thoughts:
> > 
> > Proper support for multiple decoding units (e.g. H.264 slices) per
> > frame should not be an afterthought ; compliance to encoded formats
> > depend on it, and the benefit of lower latency is a significant
> > consideration for vendors.
> > 
> > m2m, which we use for all stateless codecs, has a strong assumption
> > that one OUTPUT buffer consumed results in one CAPTURE buffer being
> > produced. This assumption can however be overruled: at least the venus
> > driver does it to implement the stateful specification.
> > 
> > So we need a way to specify frame boundaries when submitting encoded
> > content to the driver. One request should contain a single OUTPUT
> > buffer, containing a single decoding unit, but we need a way to
> > specify whether the driver should directly produce a CAPTURE buffer
> > from this request, or keep using the same CAPTURE buffer with
> > subsequent requests.
> > 
> > I can think of 2 ways this can be expressed:
> > 1) We keep the current m2m behavior as the default (a CAPTURE buffer
> > is produced), and add a flag to ask the driver to change that behavior
> > and hold on the CAPTURE buffer and reuse it with the next request(s) ;
> > 2) We specify that no CAPTURE buffer is produced by default, unless a
> > flag asking so is specified.
> > 
> > The flag could be specified in one of two ways:
> > a) As a new v4l2_buffer.flag for the OUTPUT buffer ;
> > b) As a dedicated control, either format-specific or more common to all codecs.
> > 
> > I tend to favor 2) and b) for this, for the reason that with H.264 at
> > least, user-space does not know whether a slice is the last slice of a
> > frame until it starts parsing the next one, and we don't know when we
> > will receive it. If we use a control to ask that a CAPTURE buffer be
> > produced, we can always submit another request with only that control
> > set once it is clear that the frame is complete (and not delay
> > decoding meanwhile). In practice I am not that familiar with
> > latency-sensitive streaming ; maybe a smart streamer would just append
> > an AUD NAL unit at the end of every frame and we can thus submit the
> > flag it with the last slice without further delay?
> > 
> > An extra constraint to enforce would be that each decoding unit
> > belonging to the same frame must be submitted with the same timestamp,
> > otherwise the request submission would fail. We really need a
> > framework to enforce all this at a higher level than individual
> > drivers, once we reach an agreement I will start working on this.
> > 
> > Formats that do not support multiple decoding units per frame would
> > reject any request that does not carry the end-of-frame information.
> > 
> > Anything missing / any further comment?
> > 
> 
> After reading through this thread and a further irc discussion I now
> understand the problem. I think there are several ways this can be
> solved, but I think this is the easiest:
> 
> Introduce a new V4L2_BUF_FLAG_HOLD_CAPTURE_BUFFER flag.
> 
> If set in the OUTPUT buffer, then don't mark the CAPTURE buffer as
> done after processing the OUTPUT buffer.
> 
> If an OUTPUT buffer was queued with a different timestamp than was
> used for the currently held CAPTURE buffer, then mark that CAPTURE
> buffer as done before starting processing this OUTPUT buffer.

Just a curiosity, can you extend on how this would be handled. If there
is a number of capture buffer, these should have "no-timestamp". So I
suspect we need the condition to differentiate no-timestamp from
previous timestamp. What I'm unclear is to what does it mean "no-
timestamp". We already stated the timestamp 0 cannot be reserved as
being an unset timestamp.

> 
> In other words, for slicing you can just always set this flag and
> group the slices by the OUTPUT timestamp. If you know that you
> reached the last slice of a frame, then you can optionally clear the
> flag to ensure the CAPTURE buffer is marked done without having to wait
> for the first slice of the next frame to arrive.
> 
> Potential disadvantage of this approach is that this relies on the
> OUTPUT timestamp to be the same for all slices of the same frame.
> 
> Which sounds reasonable to me.
> 
> In addition add a V4L2_BUF_CAP_SUPPORTS_HOLD_CAPTURE_BUFFER
> capability to signal support for this flag.
> 
> I think this can be fairly easily implemented in v4l2-mem2mem.c.
> 
> In addition, this approach is not specific to codecs, it can be
> used elsewhere as well (composing multiple output buffers into one
> capture buffer is one use-case that comes to mind).
> 
> Comments? Other ideas?

Sounds reasonable to me. I'll read through Paul's comment now and
comment if needed.

> 
> Regards,
> 
> 	Hans