On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> wrote: > > Hi Alex, > > On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote: > > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski > > <paul.kocialkowski@xxxxxxxxxxx> wrote: > > > Hi, > > > > > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote: > > > > Hi Paul, > > > > > > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski > > > > <paul.kocialkowski@xxxxxxxxxxx> wrote: > > > > > Hi, > > > > > > > > > > Thanks for this new version! I only have one comment left, see below. > > > > > > > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote: > > > > > > Documents the protocol that user-space should follow when > > > > > > communicating with stateless video decoders. > > > > > > > > > > > > The stateless video decoding API makes use of the new request and tags > > > > > > APIs. While it has been implemented with the Cedrus driver so far, it > > > > > > should probably still be considered staging for a short while. > > > > > > > > > > > > Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxxxx> > > > > > > --- > > > > > > Removing the RFC flag this time. Changes since RFCv3: > > > > > > > > > > > > * Included Tomasz and Hans feedback, > > > > > > * Expanded the decoding section to better describe the use of requests, > > > > > > * Use the tags API. > > > > > > > > > > > > Documentation/media/uapi/v4l/dev-codec.rst | 5 + > > > > > > .../media/uapi/v4l/dev-stateless-decoder.rst | 399 ++++++++++++++++++ > > > > > > 2 files changed, 404 insertions(+) > > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst > > > > > > index c61e938bd8dc..3e6a3e883f11 100644 > > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst > > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst > > > > > > @@ -6,6 +6,11 @@ > > > > > > Codec Interface > > > > > > *************** > > > > > > > > > > > > +.. toctree:: > > > > > > + :maxdepth: 1 > > > > > > + > > > > > > + dev-stateless-decoder > > > > > > + > > > > > > A V4L2 codec can compress, decompress, transform, or otherwise convert > > > > > > video data from one format into another format, in memory. Typically > > > > > > such devices are memory-to-memory devices (i.e. devices with the > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > > new file mode 100644 > > > > > > index 000000000000..7a781c89bd59 > > > > > > --- /dev/null > > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > > @@ -0,0 +1,399 @@ > > > > > > +.. -*- coding: utf-8; mode: rst -*- > > > > > > + > > > > > > +.. _stateless_decoder: > > > > > > + > > > > > > +************************************************** > > > > > > +Memory-to-memory Stateless Video Decoder Interface > > > > > > +************************************************** > > > > > > + > > > > > > +A stateless decoder is a decoder that works without retaining any kind of state > > > > > > +between processing frames. This means that each frame is decoded independently > > > > > > +of any previous and future frames, and that the client is responsible for > > > > > > +maintaining the decoding state and providing it to the decoder with each > > > > > > +decoding request. This is in contrast to the stateful video decoder interface, > > > > > > +where the hardware and driver maintain the decoding state and all the client > > > > > > +has to do is to provide the raw encoded stream. > > > > > > + > > > > > > +This section describes how user-space ("the client") is expected to communicate > > > > > > +with such decoders in order to successfully decode an encoded stream. Compared > > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of > > > > > > +this simplicity is extra complexity in the client which must maintain a > > > > > > +consistent decoding state. > > > > > > + > > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless > > > > > > +decoder must thus expose the following capabilities on its queues when > > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked: > > > > > > + > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the > > > > > > + ``OUTPUT`` queue, > > > > > > + > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT`` > > > > > > + and ``CAPTURE`` queues, > > > > > > + > > > > > > > > > > [...] > > > > > > > > > > > +Decoding > > > > > > +======== > > > > > > + > > > > > > +For each frame, the client is responsible for submitting a request to which the > > > > > > +following is attached: > > > > > > + > > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the > > > > > > + ``OUTPUT`` queue, > > > > > > > > > > Although this is still the case in the cedrus driver (but will be fixed > > > > > eventually), this requirement should be dropped because metadata is > > > > > per-slice and not per-picture in the formats we're currently aiming to > > > > > support. > > > > > > > > > > I think it would be safer to mention something like filling the output > > > > > buffer with the minimum unit size for the selected output format, to > > > > > which the associated metadata applies. > > > > > > > > I'm not sure it's a good idea. Some of the reasons why I think so: > > > > 1) There are streams that can have even 32 slices. With that, you > > > > instantly run out of V4L2 buffers even just for 1 frame. > > > > 2) The Rockchip hardware which seems to just pick all the slices one > > > > after another and which was the reason to actually put the slice data > > > > in the buffer like that. > > > > 3) Not all the metadata is per-slice. Actually most of the metadata > > > > is per frame and only what is located inside v4l2_h264_slice_param is > > > > per-slice. The corresponding control is an array, which has an entry > > > > for each slice in the buffer. Each entry includes an offset field, > > > > which points to the place in the buffer where the slice is located. > > > > > > Sorry, I realize that my email wasn't very clear. What I meant to say > > > is that the spec should specify that "at least the minimum unit size > > > for decoding should be passed in a buffer" (that's maybe not the > > > clearest wording), instead of "one frame worth of". > > > > > > I certainly don't mean to say that each slice should be held in a > > > separate buffer and totally agree with all the points you're making :) > > > > Thanks for clarifying. I will update the document and post v3 accordingly. > > > > > I just think we should still allow userspace to pass slices with a > > > finer granularity than "all the slices required for one frame". > > > > I'm afraid that doing so could open the door to some ambiguities. If > > you allow that, then are you also allowed to send more than one frame > > if the decode parameters do not change? How do drivers that only > > support full frames react when handled only parts of a frame? > > IIRC the ability to pass individual slices was brought up regarding a > potential latency benefit, but I doubt it would really be that > significant. > > Thinking about it with the points you mentionned in mind, I guess the > downsides are much more significant than the potential gain. > > So let's stick with requiring all the slices for a frame then! Ack. My view is that we can still loosen this requirement in the future, possibly behind some driver capability flag, but starting with a simpler API, with less freedom to the applications and less constraints on hardware support sounds like a better practice in general. > > Cheers, > > Paul > > > > However, it looks like supporting this might be a problem for the > > > rockchip decoder though. Note that our Allwinner VPU can also process > > > all slices one after the other, but can be configured for slice-level > > > granularity while decoding (at least it looks that way). > > > > > > Side point: After some discussions with Thierry Reading, who's looking > > > into the the Tegra VPU (also stateless), it seems that using the annex- > > > b format for h.264 would be best for everyone. So that means including > > > the start code, NAL header and "raw" slice data. I guess the same > > > should apply to other codecs too. But that should be in the associated > > > pixfmt spec, not in this general document. > > > > > > What do yout think? Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264? By the way, I proposed it once, some time ago, but it was rejected because VAAPI didn't get the full annex B stream and a V4L2 stateless VAAPI backend would have to reconstruct the stream. Best regards, Tomasz