On 10/22/2018 04:48 PM, Tomasz Figa wrote: > Due to complexity of the video decoding process, the V4L2 drivers of > stateful decoder hardware require specific sequences of V4L2 API calls > to be followed. These include capability enumeration, initialization, > decoding, seek, pause, dynamic resolution change, drain and end of > stream. > > Specifics of the above have been discussed during Media Workshops at > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux > Conference Europe 2014 in Düsseldorf. The de facto Codec API that > originated at those events was later implemented by the drivers we already > have merged in mainline, such as s5p-mfc or coda. > > The only thing missing was the real specification included as a part of > Linux Media documentation. Fix it now and document the decoder part of > the Codec API. > > Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx> > --- > Documentation/media/uapi/v4l/dev-decoder.rst | 1082 +++++++++++++++++ > Documentation/media/uapi/v4l/devices.rst | 1 + > Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 + > Documentation/media/uapi/v4l/v4l2.rst | 10 +- > .../media/uapi/v4l/vidioc-decoder-cmd.rst | 40 +- > Documentation/media/uapi/v4l/vidioc-g-fmt.rst | 14 + > 6 files changed, 1137 insertions(+), 15 deletions(-) > create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst > > diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst > new file mode 100644 > index 000000000000..09c7a6621b8e > --- /dev/null > +++ b/Documentation/media/uapi/v4l/dev-decoder.rst > @@ -0,0 +1,1082 @@ > +.. -*- coding: utf-8; mode: rst -*- > + > +.. _decoder: > + > +************************************************* > +Memory-to-memory Stateful Video Decoder Interface > +************************************************* > + > +A stateful video decoder takes complete chunks of the bitstream (e.g. Annex-B > +H.264/HEVC stream, raw VP8/9 stream) and decodes them into raw video frames in > +display order. The decoder is expected not to require any additional information > +from the client to process these buffers. > + > +Performing software parsing, processing etc. of the stream in the driver in > +order to support this interface is strongly discouraged. In case such > +operations are needed, use of the Stateless Video Decoder Interface (in > +development) is strongly advised. > + > +Conventions and notation used in this document > +============================================== > + > +1. The general V4L2 API rules apply if not specified in this document > + otherwise. > + > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC > + 2119. > + > +3. All steps not marked “optional” are required. > + > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used > + interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`, > + unless specified otherwise. > + > +5. Single-plane API (see spec) and applicable structures may be used > + interchangeably with Multi-plane API, unless specified otherwise, > + depending on decoder capabilities and following the general V4L2 > + guidelines. > + > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i = > + [0..2]: i = 0, 1, 2. > + > +7. Given an ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` > + queue containing data (decoded frame/stream) that resulted from processing > + buffer A. > + > +.. _decoder-glossary: > + > +Glossary > +======== > + > +CAPTURE > + the destination buffer queue; for decoder, the queue of buffers containing > + decoded frames; for encoder, the queue of buffers containing encoded > + bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or > + ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the hardware > + into ``CAPTURE`` buffers > + > +client > + application client communicating with the decoder or encoder implementing > + this interface > + > +coded format > + encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see > + also: raw format > + > +coded height > + height for given coded resolution > + > +coded resolution > + stream resolution in pixels aligned to codec and hardware requirements; > + typically visible resolution rounded up to full macroblocks; > + see also: visible resolution > + > +coded width > + width for given coded resolution > + > +decode order > + the order in which frames are decoded; may differ from display order if the > + coded format includes a feature of frame reordering; for decoders, > + ``OUTPUT`` buffers must be queued by the client in decode order; for > + encoders ``CAPTURE`` buffers must be returned by the encoder in decode order > + > +destination > + data resulting from the decode process; ``CAPTURE`` > + > +display order > + the order in which frames must be displayed; for encoders, ``OUTPUT`` > + buffers must be queued by the client in display order; for decoders, > + ``CAPTURE`` buffers must be returned by the decoder in display order > + > +DPB > + Decoded Picture Buffer; an H.264 term for a buffer that stores a decoded > + raw frame available for reference in further decoding steps. > + > +EOS > + end of stream > + > +IDR > + Instantaneous Decoder Refresh; a type of a keyframe in H.264-encoded stream, > + which clears the list of earlier reference frames (DPBs) > + > +keyframe > + an encoded frame that does not reference frames decoded earlier, i.e. > + can be decoded fully on its own. > + > +macroblock > + a processing unit in image and video compression formats based on linear > + block transforms (e.g. H.264, VP8, VP9); codec-specific, but for most of > + popular codecs the size is 16x16 samples (pixels) > + > +OUTPUT > + the source buffer queue; for decoders, the queue of buffers containing > + encoded bitstream; for encoders, the queue of buffers containing raw frames; > + ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the > + hardware is fed with data from ``OUTPUT`` buffers > + > +PPS > + Picture Parameter Set; a type of metadata entity in H.264 bitstream > + > +raw format > + uncompressed format containing raw pixel data (e.g. YUV, RGB formats) > + > +resume point > + a point in the bitstream from which decoding may start/continue, without > + any previous state/data present, e.g.: a keyframe (VP8/VP9) or > + SPS/PPS/IDR sequence (H.264); a resume point is required to start decode > + of a new stream, or to resume decoding after a seek > + > +source > + data fed to the decoder or encoder; ``OUTPUT`` > + > +source height > + height in pixels for given source resolution; relevant to encoders only > + > +source resolution > + resolution in pixels of source frames being source to the encoder and > + subject to further cropping to the bounds of visible resolution; relevant to > + encoders only > + > +source width > + width in pixels for given source resolution; relevant to encoders only > + > +SPS > + Sequence Parameter Set; a type of metadata entity in H.264 bitstream > + > +stream metadata > + additional (non-visual) information contained inside encoded bitstream; > + for example: coded resolution, visible resolution, codec profile > + > +visible height > + height for given visible resolution; display height > + > +visible resolution > + stream resolution of the visible picture, in pixels, to be used for > + display purposes; must be smaller or equal to coded resolution; > + display resolution > + > +visible width > + width for given visible resolution; display width > + > +State machine > +============= > + > +.. kernel-render:: DOT > + :alt: DOT digraph of decoder state machine > + :caption: Decoder state machine > + > + digraph decoder_state_machine { > + node [shape = doublecircle, label="Decoding"] Decoding; > + > + node [shape = circle, label="Initialization"] Initialization; > + node [shape = circle, label="Capture\nsetup"] CaptureSetup; > + node [shape = circle, label="Dynamic\nresolution\nchange"] ResChange; > + node [shape = circle, label="Stopped"] Stopped; > + node [shape = circle, label="Drain"] Drain; > + node [shape = circle, label="Seek"] Seek; > + node [shape = circle, label="End of stream"] EoS; > + > + node [shape = point]; qi > + qi -> Initialization [ label = "open()" ]; > + > + Initialization -> CaptureSetup [ label = "CAPTURE\nformat\nestablished" ]; > + > + CaptureSetup -> Stopped [ label = "CAPTURE\nbuffers\nready" ]; > + > + Decoding -> ResChange [ label = "Stream\nresolution\nchange" ]; > + Decoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ]; > + Decoding -> EoS [ label = "EoS mark\nin the stream" ]; > + Decoding -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ]; > + Decoding -> Stopped [ label = "VIDIOC_STREAMOFF(CAPTURE)" ]; > + Decoding -> Decoding; > + > + ResChange -> CaptureSetup [ label = "CAPTURE\nformat\nestablished" ]; > + ResChange -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ]; > + > + EoS -> Drain [ label = "Implicit\ndrain" ]; > + > + Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ]; > + Drain -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ]; > + > + Seek -> Decoding [ label = "VIDIOC_STREAMON(OUTPUT)" ]; > + Seek -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ]; > + > + Stopped -> Decoding [ label = "V4L2_DEC_CMD_START\nor\nVIDIOC_STREAMON(CAPTURE)" ]; > + Stopped -> Seek [ label = "VIDIOC_STREAMOFF(OUTPUT)" ]; > + } > + > +Querying capabilities > +===================== > + > +1. To enumerate the set of coded formats supported by the decoder, the > + client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``. > + > + * The full set of supported formats will be returned, regardless of the > + format set on ``CAPTURE``. > + > +2. To enumerate the set of supported raw formats, the client may call > + :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``. > + > + * Only the formats supported for the format currently active on ``OUTPUT`` > + will be returned. > + > + * In order to enumerate raw formats supported by a given coded format, > + the client must first set that coded format on ``OUTPUT`` and then > + enumerate formats on ``CAPTURE``. > + > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported > + resolutions for a given format, passing desired pixel format in > + :c:type:`v4l2_frmsizeenum` ``pixel_format``. > + > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel > + formats will include all possible coded resolutions supported by the > + decoder for given coded pixel format. > + > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format > + will include all possible frame buffer resolutions supported by the > + decoder for given raw pixel format and the coded format currently set on > + ``OUTPUT``. > + > +4. Supported profiles and levels for given format, if applicable, may be > + queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`. > + > +Initialization > +============== > + > +1. **Optional.** Enumerate supported ``OUTPUT`` formats and resolutions. See > + `Querying capabilities` above. > + > +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT` > + > + * **Required fields:** > + > + ``type`` > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > + > + ``pixelformat`` > + a coded pixel format > + > + ``width``, ``height`` > + required only if cannot be parsed from the stream for the given > + coded format; optional otherwise - set to zero to ignore > + > + ``sizeimage`` > + desired size of ``OUTPUT`` buffers; the decoder may adjust it to > + match hardware requirements > + > + other fields > + follow standard semantics > + > + * **Return fields:** > + > + ``sizeimage`` > + adjusted size of ``CAPTURE`` buffers > + > + * If width and height are set to non-zero values, the ``CAPTURE`` format > + will be updated with an appropriate frame buffer resolution instantly. > + However, for coded formats that include stream resolution information, > + after the decoder is done parsing the information from the stream, it will > + update the ``CAPTURE`` format with new values and signal a source change > + event. > + > + .. warning:: > + > + Changing the ``OUTPUT`` format may change the currently set ``CAPTURE`` > + format. The decoder will derive a new ``CAPTURE`` format from the > + ``OUTPUT`` format being set, including resolution, colorimetry > + parameters, etc. If the client needs a specific ``CAPTURE`` format, it > + must adjust it afterwards. > + > +3. **Optional.** Query the minimum number of buffers required for ``OUTPUT`` > + queue via :c:func:`VIDIOC_G_CTRL`. This is useful if the client intends to > + use more buffers than the minimum required by hardware/format. > + > + * **Required fields:** > + > + ``id`` > + set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT`` > + > + * **Return fields:** > + > + ``value`` > + the minimum number of ``OUTPUT`` buffers required for the currently > + set format > + > +4. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on > + ``OUTPUT``. > + > + * **Required fields:** > + > + ``count`` > + requested number of buffers to allocate; greater than zero > + > + ``type`` > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > + > + ``memory`` > + follows standard semantics > + > + * **Return fields:** > + > + ``count`` > + the actual number of buffers allocated > + > + .. warning:: > + > + The actual number of allocated buffers may differ from the ``count`` > + given. The client must check the updated value of ``count`` after the > + call returns. > + > + .. note:: > + > + To allocate more than the minimum number of buffers (for pipeline > + depth), the client may query the ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT`` > + control to get the minimum number of buffers required by the > + decoder/format, and pass the obtained value plus the number of > + additional buffers needed in the ``count`` field to > + :c:func:`VIDIOC_REQBUFS`. > + > + Alternatively, :c:func:`VIDIOC_CREATE_BUFS` on the ``OUTPUT`` queue can be > + used to have more control over buffer allocation. > + > + * **Required fields:** > + > + ``count`` > + requested number of buffers to allocate; greater than zero > + > + ``type`` > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > + > + ``memory`` > + follows standard semantics > + > + ``format`` > + follows standard semantics > + > + * **Return fields:** > + > + ``count`` > + adjusted to the number of allocated buffers > + > + .. warning:: > + > + The actual number of allocated buffers may differ from the ``count`` > + given. The client must check the updated value of ``count`` after the > + call returns. > + > +5. Start streaming on the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`. > + > +6. **This step only applies to coded formats that contain resolution information > + in the stream.** As far as I know all codecs have resolution/metadata in the stream. As discussed in the "[PATCH vicodec v4 0/3] Add support to more pixel formats in vicodec" thread, it is easiest to assume that there is always metadata. Perhaps there should be a single mention somewhere that such codecs are not supported at the moment, but to be frank how can you decode a stream without it containing such essential information? You are much more likely to implement such a codec as a stateless codec. So I would just drop this sentence here (and perhaps at other places in this document or the encoder document as well). Regards, Hans