Hi Alex, On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote: > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski > <paul.kocialkowski@xxxxxxxxxxx> wrote: > > Hi, > > > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote: > > > Hi Paul, > > > > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski > > > <paul.kocialkowski@xxxxxxxxxxx> wrote: > > > > Hi, > > > > > > > > Thanks for this new version! I only have one comment left, see below. > > > > > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote: > > > > > Documents the protocol that user-space should follow when > > > > > communicating with stateless video decoders. > > > > > > > > > > The stateless video decoding API makes use of the new request and tags > > > > > APIs. While it has been implemented with the Cedrus driver so far, it > > > > > should probably still be considered staging for a short while. > > > > > > > > > > Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxxxx> > > > > > --- > > > > > Removing the RFC flag this time. Changes since RFCv3: > > > > > > > > > > * Included Tomasz and Hans feedback, > > > > > * Expanded the decoding section to better describe the use of requests, > > > > > * Use the tags API. > > > > > > > > > > Documentation/media/uapi/v4l/dev-codec.rst | 5 + > > > > > .../media/uapi/v4l/dev-stateless-decoder.rst | 399 ++++++++++++++++++ > > > > > 2 files changed, 404 insertions(+) > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst > > > > > index c61e938bd8dc..3e6a3e883f11 100644 > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst > > > > > @@ -6,6 +6,11 @@ > > > > > Codec Interface > > > > > *************** > > > > > > > > > > +.. toctree:: > > > > > + :maxdepth: 1 > > > > > + > > > > > + dev-stateless-decoder > > > > > + > > > > > A V4L2 codec can compress, decompress, transform, or otherwise convert > > > > > video data from one format into another format, in memory. Typically > > > > > such devices are memory-to-memory devices (i.e. devices with the > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > new file mode 100644 > > > > > index 000000000000..7a781c89bd59 > > > > > --- /dev/null > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst > > > > > @@ -0,0 +1,399 @@ > > > > > +.. -*- coding: utf-8; mode: rst -*- > > > > > + > > > > > +.. _stateless_decoder: > > > > > + > > > > > +************************************************** > > > > > +Memory-to-memory Stateless Video Decoder Interface > > > > > +************************************************** > > > > > + > > > > > +A stateless decoder is a decoder that works without retaining any kind of state > > > > > +between processing frames. This means that each frame is decoded independently > > > > > +of any previous and future frames, and that the client is responsible for > > > > > +maintaining the decoding state and providing it to the decoder with each > > > > > +decoding request. This is in contrast to the stateful video decoder interface, > > > > > +where the hardware and driver maintain the decoding state and all the client > > > > > +has to do is to provide the raw encoded stream. > > > > > + > > > > > +This section describes how user-space ("the client") is expected to communicate > > > > > +with such decoders in order to successfully decode an encoded stream. Compared > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of > > > > > +this simplicity is extra complexity in the client which must maintain a > > > > > +consistent decoding state. > > > > > + > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless > > > > > +decoder must thus expose the following capabilities on its queues when > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked: > > > > > + > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the > > > > > + ``OUTPUT`` queue, > > > > > + > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT`` > > > > > + and ``CAPTURE`` queues, > > > > > + > > > > > > > > [...] > > > > > > > > > +Decoding > > > > > +======== > > > > > + > > > > > +For each frame, the client is responsible for submitting a request to which the > > > > > +following is attached: > > > > > + > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the > > > > > + ``OUTPUT`` queue, > > > > > > > > Although this is still the case in the cedrus driver (but will be fixed > > > > eventually), this requirement should be dropped because metadata is > > > > per-slice and not per-picture in the formats we're currently aiming to > > > > support. > > > > > > > > I think it would be safer to mention something like filling the output > > > > buffer with the minimum unit size for the selected output format, to > > > > which the associated metadata applies. > > > > > > I'm not sure it's a good idea. Some of the reasons why I think so: > > > 1) There are streams that can have even 32 slices. With that, you > > > instantly run out of V4L2 buffers even just for 1 frame. > > > 2) The Rockchip hardware which seems to just pick all the slices one > > > after another and which was the reason to actually put the slice data > > > in the buffer like that. > > > 3) Not all the metadata is per-slice. Actually most of the metadata > > > is per frame and only what is located inside v4l2_h264_slice_param is > > > per-slice. The corresponding control is an array, which has an entry > > > for each slice in the buffer. Each entry includes an offset field, > > > which points to the place in the buffer where the slice is located. > > > > Sorry, I realize that my email wasn't very clear. What I meant to say > > is that the spec should specify that "at least the minimum unit size > > for decoding should be passed in a buffer" (that's maybe not the > > clearest wording), instead of "one frame worth of". > > > > I certainly don't mean to say that each slice should be held in a > > separate buffer and totally agree with all the points you're making :) > > Thanks for clarifying. I will update the document and post v3 accordingly. > > > I just think we should still allow userspace to pass slices with a > > finer granularity than "all the slices required for one frame". > > I'm afraid that doing so could open the door to some ambiguities. If > you allow that, then are you also allowed to send more than one frame > if the decode parameters do not change? How do drivers that only > support full frames react when handled only parts of a frame? IIRC the ability to pass individual slices was brought up regarding a potential latency benefit, but I doubt it would really be that significant. Thinking about it with the points you mentionned in mind, I guess the downsides are much more significant than the potential gain. So let's stick with requiring all the slices for a frame then! Cheers, Paul > > However, it looks like supporting this might be a problem for the > > rockchip decoder though. Note that our Allwinner VPU can also process > > all slices one after the other, but can be configured for slice-level > > granularity while decoding (at least it looks that way). > > > > Side point: After some discussions with Thierry Reading, who's looking > > into the the Tegra VPU (also stateless), it seems that using the annex- > > b format for h.264 would be best for everyone. So that means including > > the start code, NAL header and "raw" slice data. I guess the same > > should apply to other codecs too. But that should be in the associated > > pixfmt spec, not in this general document. > > > > What do yout think? > > > > Cheers, > > > > Paul > > > > > Best regards, > > > Tomasz > > > > > > > > +* All the controls relevant to the format being decoded (see below for details). > > > > > + > > > > > +The contents of the source ``OUTPUT`` buffer, as well as the controls that must > > > > > +be set on the request, depend on the active coded pixel format and might be > > > > > +affected by codec-specific extended controls, as stated in documentation of each > > > > > +format. > > > > > + > > > > > +A typical frame would thus be decoded using the following sequence: > > > > > + > > > > > +1. Queue an ``OUTPUT`` buffer containing one frame worth of encoded bitstream > > > > > > > > Ditto here. > > > > > > > > > + data for the decoding request, using :c:func:`VIDIOC_QBUF`. > > > > > + > > > > > + * **Required fields:** > > > > > + > > > > > + ``index`` > > > > > + index of the buffer being queued. > > > > > + > > > > > + ``type`` > > > > > + type of the buffer. > > > > > + > > > > > + ``bytesused`` > > > > > + number of bytes taken by the encoded data frame in the buffer. > > > > > + > > > > > + ``flags`` > > > > > + the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set. In addition, if > > > > > + the decoded frame is to be used as a reference frame in the future, > > > > > + then the ``V4L2_BUF_FLAG_TAG`` flag must be set (it can also be set > > > > > + for non-reference frames if it helps the client). > > > > > + > > > > > + ``request_fd`` > > > > > + must be set to the file descriptor of the decoding request. > > > > > + > > > > > + ``tag`` > > > > > + if the ``V4L2_BUF_FLAG_TAG`` is set, then this must contain the tag > > > > > + for the frame that will be copied into the decoded frame buffer, and > > > > > + can be used to specify this frame as a reference frame for another > > > > > + one. > > > > > + > > > > > + .. note:: > > > > > + > > > > > + The API currently requires one frame of encoded data per ``OUTPUT`` buffer, > > > > > + even though some encoded formats may present their data in smaller chunks > > > > > + (e.g. H.264's frames can be made of several slices that can be processed > > > > > + independently). It is currently the responsibility of the client to gather > > > > > + the different parts of a frame into a single ``OUTPUT`` buffer, while > > > > > + preserving the same layout as the original bitstream. This > > > > > + restriction may be lifted in the future. > > > > > > > > And this part should probably be dropped too. > > > > > > > > Cheers, > > > > > > > > Paul > > > > > > > > > +2. Set the codec-specific controls for the decoding request, using > > > > > + :c:func:`VIDIOC_S_EXT_CTRLS`. > > > > > + > > > > > + * **Required fields:** > > > > > + > > > > > + ``which`` > > > > > + must be ``V4L2_CTRL_WHICH_REQUEST_VAL``. > > > > > + > > > > > + ``request_fd`` > > > > > + must be set to the file descriptor of the decoding request. > > > > > + > > > > > + other fields > > > > > + other fields are set as usual when setting controls. The ``controls`` > > > > > + array must contain all the codec-specific controls required to decode > > > > > + a frame. > > > > > + > > > > > + .. note:: > > > > > + > > > > > + It is possible to specify the controls in different invocations of > > > > > + :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as > > > > > + long as ``request_fd`` and ``which`` are properly set. The controls state > > > > > + at the moment of request submission is the one that will be considered. > > > > > + > > > > > + .. note:: > > > > > + > > > > > + The order in which steps 1 and 2 take place is interchangeable. > > > > > + > > > > > +3. Submit the request by invoking :c:func:`MEDIA_IOC_REQUEST_QUEUE` on the > > > > > + request FD. > > > > > + > > > > > + If the request is submitted without an ``OUTPUT`` buffer, or if some of the > > > > > + required controls are missing from the request, then > > > > > + :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one > > > > > + ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``. > > > > > + :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no > > > > > + ``CAPTURE`` buffer will be produced for this request. > > > > > + > > > > > +``CAPTURE`` buffers must not be part of the request, and are queued > > > > > +independently. They are returned in decode order (i.e. the same order as > > > > > +``OUTPUT`` buffers were submitted). > > > > > + > > > > > +Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers > > > > > +carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an > > > > > +error, then all following decoded frames that refer to it also have the > > > > > +``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to > > > > > +produce a (likely corrupted) frame. > > > > > + > > > > > +Buffer management while decoding > > > > > +================================ > > > > > +Contrary to stateful decoders, a stateless decoder does not perform any kind of > > > > > +buffer management: it only guarantees that dequeued ``CAPTURE`` buffer can be > > > > > +used by the client for as long as they are not queued again. "Used" here > > > > > +encompasses using the buffer for compositing, display, or as a reference frame > > > > > +to decode a subsequent frame. > > > > > + > > > > > +Reference frames are specified by using the same tag that was set to the > > > > > +``OUTPUT`` buffer of a frame into the relevant codec-specific structures that > > > > > +are submitted as controls. This tag will be copied to the corresponding > > > > > +``CAPTURE`` buffer, but can be used in any subsequent decoding request as soon > > > > > +as the decoding request for that buffer is queued successfully. This means that > > > > > +the client does not need to wait until a ``CAPTURE`` buffer with a given tag is > > > > > +dequeued to start using that tag in reference frames. However, it must wait > > > > > +until all frames referencing a given tag are dequeued before queuing the > > > > > +referenced ``CAPTURE`` buffer again, since queueing a buffer effectively removes > > > > > +its tag. > > > > > + > > > > > +When queuing a decoding request, the driver will increase the reference count of > > > > > +all the resources associated with reference frames. This means that the client > > > > > +can e.g. close the DMABUF file descriptors of the reference frame buffers if it > > > > > +won't need it afterwards, as long as the V4L2 ``CAPTURE`` buffer of the > > > > > +reference frame is not re-queued before all referencing frames are decoded. > > > > > + > > > > > +Seeking > > > > > +======= > > > > > +In order to seek, the client just needs to submit requests using input buffers > > > > > +corresponding to the new stream position. It must however be aware that > > > > > +resolution may have changed and follow the dynamic resolution change sequence in > > > > > +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS > > > > > +for H.264) may have changed and the client is responsible for making sure that a > > > > > +valid state is sent to the decoder. > > > > > + > > > > > +The client is then free to ignore any returned ``CAPTURE`` buffer that comes > > > > > +from the pre-seek position. > > > > > + > > > > > +Pause > > > > > +===== > > > > > + > > > > > +In order to pause, the client can just cease queuing buffers onto the ``OUTPUT`` > > > > > +queue. Without source bitstream data, there is no data to process and the codec > > > > > +will remain idle. > > > > > + > > > > > +Dynamic resolution change > > > > > +========================= > > > > > + > > > > > +If the client detects a resolution change in the stream, it will need to perform > > > > > +the initialization sequence again with the new resolution: > > > > > + > > > > > +1. Wait until all submitted requests have completed and dequeue the > > > > > + corresponding output buffers. > > > > > + > > > > > +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE`` > > > > > + queues. > > > > > + > > > > > +3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the > > > > > + ``CAPTURE`` queue with a buffer count of zero. > > > > > + > > > > > +4. Perform the initialization sequence again (minus the allocation of > > > > > + ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue. > > > > > + Note that due to resolution constraints, a different format may need to be > > > > > + picked on the ``CAPTURE`` queue. > > > > > + > > > > > +Drain > > > > > +===== > > > > > + > > > > > +In order to drain the stream on a stateless decoder, the client just needs to > > > > > +wait until all the submitted requests are completed. There is no need to send a > > > > > +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the > > > > > +decoder. > > > > > + > > > > > +End of stream > > > > > +============= > > > > > + > > > > > +When the client detects that the end of stream is reached, it can simply stop > > > > > +sending new frames to the decoder, drain the ``CAPTURE`` queue, and dispose of > > > > > +the decoder as needed. > > > > -- > > > > Paul Kocialkowski, Bootlin (formerly Free Electrons) > > > > Embedded Linux and kernel engineering > > > > https://bootlin.com > > > > > > -- > > Paul Kocialkowski, Bootlin > > Embedded Linux and kernel engineering > > https://bootlin.com > > -- Paul Kocialkowski, Bootlin Embedded Linux and kernel engineering https://bootlin.com