Le mercredi 10 avril 2019 à 10:50 +0200, Hans Verkuil a écrit : > On 4/9/19 11:35 AM, Tomasz Figa wrote: > > On Mon, Apr 8, 2019 at 8:11 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: > > > On 4/8/19 11:23 AM, Tomasz Figa wrote: > > > > On Fri, Apr 5, 2019 at 7:03 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: > > > > > On 4/5/19 10:12 AM, Tomasz Figa wrote: > > > > > > On Thu, Mar 14, 2019 at 10:57 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: > > > > > > > Hi Tomasz, > > > > > > > > > > > > > > Some more comments... > > > > > > > > > > > > > > On 1/29/19 2:52 PM, Hans Verkuil wrote: > > > > > > > > Hi Tomasz, > > > > > > > > > > > > > > > > Some comments below. Nothing major, so I think a v4 should be ready to be > > > > > > > > merged. > > > > > > > > > > > > > > > > On 1/24/19 11:04 AM, Tomasz Figa wrote: > > > > > > > > > Due to complexity of the video encoding process, the V4L2 drivers of > > > > > > > > > stateful encoder hardware require specific sequences of V4L2 API calls > > > > > > > > > to be followed. These include capability enumeration, initialization, > > > > > > > > > encoding, encode parameters change, drain and reset. > > > > > > > > > > > > > > > > > > Specifics of the above have been discussed during Media Workshops at > > > > > > > > > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux > > > > > > > > > Conference Europe 2014 in Düsseldorf. The de facto Codec API that > > > > > > > > > originated at those events was later implemented by the drivers we already > > > > > > > > > have merged in mainline, such as s5p-mfc or coda. > > > > > > > > > > > > > > > > > > The only thing missing was the real specification included as a part of > > > > > > > > > Linux Media documentation. Fix it now and document the encoder part of > > > > > > > > > the Codec API. > > > > > > > > > > > > > > > > > > Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > Documentation/media/uapi/v4l/dev-encoder.rst | 586 ++++++++++++++++++ > > > > > > > > > Documentation/media/uapi/v4l/dev-mem2mem.rst | 1 + > > > > > > > > > Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 + > > > > > > > > > Documentation/media/uapi/v4l/v4l2.rst | 2 + > > > > > > > > > .../media/uapi/v4l/vidioc-encoder-cmd.rst | 38 +- > > > > > > > > > 5 files changed, 617 insertions(+), 15 deletions(-) > > > > > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst > > > > > > > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst > > > > > > > > > new file mode 100644 > > > > > > > > > index 000000000000..fb8b05a132ee > > > > > > > > > --- /dev/null > > > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst > > > > > > > > > @@ -0,0 +1,586 @@ > > > > > > > > > +.. -*- coding: utf-8; mode: rst -*- > > > > > > > > > + > > > > > > > > > +.. _encoder: > > > > > > > > > + > > > > > > > > > +************************************************* > > > > > > > > > +Memory-to-memory Stateful Video Encoder Interface > > > > > > > > > +************************************************* > > > > > > > > > + > > > > > > > > > +A stateful video encoder takes raw video frames in display order and encodes > > > > > > > > > +them into a bitstream. It generates complete chunks of the bitstream, including > > > > > > > > > +all metadata, headers, etc. The resulting bitstream does not require any > > > > > > > > > +further post-processing by the client. > > > > > > > > > + > > > > > > > > > +Performing software stream processing, header generation etc. in the driver > > > > > > > > > +in order to support this interface is strongly discouraged. In case such > > > > > > > > > +operations are needed, use of the Stateless Video Encoder Interface (in > > > > > > > > > +development) is strongly advised. > > > > > > > > > + > > > > > > > > > +Conventions and notation used in this document > > > > > > > > > +============================================== > > > > > > > > > + > > > > > > > > > +1. The general V4L2 API rules apply if not specified in this document > > > > > > > > > + otherwise. > > > > > > > > > + > > > > > > > > > +2. The meaning of words "must", "may", "should", etc. is as per `RFC > > > > > > > > > + 2119 <https://tools.ietf.org/html/rfc2119>`_. > > > > > > > > > + > > > > > > > > > +3. All steps not marked "optional" are required. > > > > > > > > > + > > > > > > > > > +4. :c:func:`VIDIOC_G_EXT_CTRLS` and :c:func:`VIDIOC_S_EXT_CTRLS` may be used > > > > > > > > > + interchangeably with :c:func:`VIDIOC_G_CTRL` and :c:func:`VIDIOC_S_CTRL`, > > > > > > > > > + unless specified otherwise. > > > > > > > > > + > > > > > > > > > +5. Single-planar API (see :ref:`planar-apis`) and applicable structures may be > > > > > > > > > + used interchangeably with multi-planar API, unless specified otherwise, > > > > > > > > > + depending on decoder capabilities and following the general V4L2 guidelines. > > > > > > > > > + > > > > > > > > > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i = > > > > > > > > > + [0..2]: i = 0, 1, 2. > > > > > > > > > + > > > > > > > > > +7. Given an ``OUTPUT`` buffer A, then A’ represents a buffer on the ``CAPTURE`` > > > > > > > > > + queue containing data that resulted from processing buffer A. > > > > > > > > > + > > > > > > > > > +Glossary > > > > > > > > > +======== > > > > > > > > > + > > > > > > > > > +Refer to :ref:`decoder-glossary`. > > > > > > > > > + > > > > > > > > > +State machine > > > > > > > > > +============= > > > > > > > > > + > > > > > > > > > +.. kernel-render:: DOT > > > > > > > > > + :alt: DOT digraph of encoder state machine > > > > > > > > > + :caption: Encoder state machine > > > > > > > > > + > > > > > > > > > + digraph encoder_state_machine { > > > > > > > > > + node [shape = doublecircle, label="Encoding"] Encoding; > > > > > > > > > + > > > > > > > > > + node [shape = circle, label="Initialization"] Initialization; > > > > > > > > > + node [shape = circle, label="Stopped"] Stopped; > > > > > > > > > + node [shape = circle, label="Drain"] Drain; > > > > > > > > > + node [shape = circle, label="Reset"] Reset; > > > > > > > > > + > > > > > > > > > + node [shape = point]; qi > > > > > > > > > + qi -> Initialization [ label = "open()" ]; > > > > > > > > > + > > > > > > > > > + Initialization -> Encoding [ label = "Both queues streaming" ]; > > > > > > > > > + > > > > > > > > > + Encoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ]; > > > > > > > > > + Encoding -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ]; > > > > > > > > > + Encoding -> Stopped [ label = "VIDIOC_STREAMOFF(OUTPUT)" ]; > > > > > > > > > + Encoding -> Encoding; > > > > > > > > > + > > > > > > > > > + Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ]; > > > > > > > > > + Drain -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ]; > > > > > > > > > + > > > > > > > > > + Reset -> Encoding [ label = "VIDIOC_STREAMON(CAPTURE)" ]; > > > > > > > > > + Reset -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ]; > > > > > > > > > + > > > > > > > > > + Stopped -> Encoding [ label = "V4L2_DEC_CMD_START\nor\nVIDIOC_STREAMON(OUTPUT)" ]; > > > > > > > > > + Stopped -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ]; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > +Querying capabilities > > > > > > > > > +===================== > > > > > > > > > + > > > > > > > > > +1. To enumerate the set of coded formats supported by the encoder, the > > > > > > > > > + client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``. > > > > > > > > > + > > > > > > > > > + * The full set of supported formats will be returned, regardless of the > > > > > > > > > + format set on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > +2. To enumerate the set of supported raw formats, the client may call > > > > > > > > > + :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > + * Only the formats supported for the format currently active on ``CAPTURE`` > > > > > > > > > + will be returned. > > > > > > > > > + > > > > > > > > > + * In order to enumerate raw formats supported by a given coded format, > > > > > > > > > + the client must first set that coded format on ``CAPTURE`` and then > > > > > > > > > + enumerate the formats on ``OUTPUT``. > > > > > > > > > + > > > > > > > > > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported > > > > > > > > > + resolutions for a given format, passing desired pixel format in > > > > > > > > > + :c:type:`v4l2_frmsizeenum` ``pixel_format``. > > > > > > > > > + > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel > > > > > > > > > + format will include all possible coded resolutions supported by the > > > > > > > > > + encoder for given coded pixel format. > > > > > > > > > + > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format > > > > > > > > > + will include all possible frame buffer resolutions supported by the > > > > > > > > > + encoder for given raw pixel format and coded format currently set on > > > > > > > > > + ``CAPTURE``. > > > > > > > > > + > > > > > > > > > +4. Supported profiles and levels for the coded format currently set on > > > > > > > > > + ``CAPTURE``, if applicable, may be queried using their respective controls > > > > > > > > > + via :c:func:`VIDIOC_QUERYCTRL`. > > > > > > > > > + > > > > > > > > > +5. Any additional encoder capabilities may be discovered by querying > > > > > > > > > + their respective controls. > > > > > > > > > + > > > > > > > > > +Initialization > > > > > > > > > +============== > > > > > > > > > + > > > > > > > > > +1. Set the coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT` > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE`` > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + the coded format to be produced > > > > > > > > > + > > > > > > > > > + ``sizeimage`` > > > > > > > > > + desired size of ``CAPTURE`` buffers; the encoder may adjust it to > > > > > > > > > + match hardware requirements > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + ignored (always zero) > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``sizeimage`` > > > > > > > > > + adjusted size of ``CAPTURE`` buffers > > > > > > > > > + > > > > > > > > > + .. important:: > > > > > > > > > + > > > > > > > > > + Changing the ``CAPTURE`` format may change the currently set ``OUTPUT`` > > > > > > > > > + format. The encoder will derive a new ``OUTPUT`` format from the > > > > > > > > > + ``CAPTURE`` format being set, including resolution, colorimetry > > > > > > > > > + parameters, etc. If the client needs a specific ``OUTPUT`` format, it > > > > > > > > > + must adjust it afterwards. > > > > > > > > > > > > > > > > Hmm, "including resolution": if width and height are set to 0, what should the > > > > > > > > OUTPUT resolution be? Up to the driver? I think this should be clarified since > > > > > > > > at a first reading of this paragraph it appears to be contradictory. > > > > > > > > > > > > > > I think the driver should just return the width and height of the OUTPUT > > > > > > > format. So the width and height that userspace specifies is just ignored > > > > > > > and replaced by the width and height of the OUTPUT format. After all, that's > > > > > > > what the bitstream will encode. Returning 0 for width and height would make > > > > > > > this a strange exception in V4L2 and I want to avoid that. > > > > > > > > > > > > > > > > > > > Hmm, however, the width and height of the OUTPUT format is not what's > > > > > > actually encoded in the bitstream. The right selection rectangle > > > > > > determines that. > > > > > > > > > > > > In one of the previous versions I though we could put the codec > > > > > > > > s/codec/coded/... > > > > > > > > > > resolution as the width and height of the CAPTURE format, which would > > > > > > be the resolution of the encoded image rounded up to full macroblocks > > > > > > +/- some encoder-specific constraints. AFAIR there was some concern > > > > > > about OUTPUT format changes triggering CAPTURE format changes, but to > > > > > > be honest, I'm not sure if that's really a problem. I just decided to > > > > > > drop that for the simplicity. > > > > > > > > > > I'm not sure what your point is. > > > > > > > > > > The OUTPUT format has the coded resolution, > > > > > > > > That's not always true. The OUTPUT format is just the format of the > > > > source frame buffers. In special cases where the source resolution is > > > > nicely aligned, it would be the same as coded size, but the remaining > > > > cases are valid as well. > > > > > > > > > so when you set the > > > > > CAPTURE format it can just copy the OUTPUT coded resolution unless the > > > > > chosen CAPTURE pixelformat can't handle that in which case both the > > > > > OUTPUT and CAPTURE coded resolutions are clamped to whatever is the maximum > > > > > or minimum the codec is capable of. > > > > > > > > As per my comment above, generally speaking, the encoder will derive > > > > an appropriate coded format from the OUTPUT format, but also other > > > > factors, like the crop rectangles and possibly some internal > > > > constraints. > > > > > > > > > That said, I am fine with just leaving it up to the driver as suggested > > > > > before. Just as long as both the CAPTURE and OUTPUT formats remain valid > > > > > (i.e. width and height may never be out of range). > > > > > > > > > > > > > Sounds good to me. > > > > > > > > > > > > > + > > > > > > > > > +2. **Optional.** Enumerate supported ``OUTPUT`` formats (raw formats for > > > > > > > > > + source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`. > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + raw format supported for the coded format currently selected on > > > > > > > > > + the ``CAPTURE`` queue. > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > +3. Set the raw source format on the ``OUTPUT`` queue via > > > > > > > > > + :c:func:`VIDIOC_S_FMT`. > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > > > > > > > > > + > > > > > > > > > + ``pixelformat`` > > > > > > > > > + raw format of the source > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + source resolution > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``width``, ``height`` > > > > > > > > > + may be adjusted by encoder to match alignment requirements, as > > > > > > > > > + required by the currently selected formats > > > > > > > > > > > > > > > > What if the width x height is larger than the maximum supported by the > > > > > > > > selected coded format? This should probably mention that in that case the > > > > > > > > width x height is reduced to the largest allowed value. Also mention that > > > > > > > > this maximum is reported by VIDIOC_ENUM_FRAMESIZES. > > > > > > > > > > > > > > > > > + > > > > > > > > > + other fields > > > > > > > > > + follow standard semantics > > > > > > > > > + > > > > > > > > > + * Setting the source resolution will reset the selection rectangles to their > > > > > > > > > + default values, based on the new resolution, as described in the step 5 > > > > > > > > > > > > > > > > 5 -> 4 > > > > > > > > > > > > > > > > Or just say: "as described in the next step." > > > > > > > > > > > > > > > > > + below. > > > > > > > > > > > > > > It should also be made explicit that: > > > > > > > > > > > > > > 1) the crop rectangle will be set to the given width and height *before* > > > > > > > it is being adjusted by S_FMT. > > > > > > > > > > > > > > > > > > > I don't think that's what we want here. > > > > > > > > > > > > Defining the default rectangle to be exactly the same as the OUTPUT > > > > > > resolution (after the adjustment) makes the semantics consistent - not > > > > > > setting the crop rectangle gives you exactly the behavior as if there > > > > > > was no cropping involved (or supported by the encoder). > > > > > > > > > > I think you are right. This seems to be what the coda driver does as well. > > > > > It is convenient to be able to just set a 1920x1080 format and have that > > > > > resolution be stored as the crop rectangle, since it avoids having to call > > > > > s_selection afterwards, but it is not really consistent with the way V4L2 > > > > > works. > > > > > > > > > > > > Open question: should we support a compose rectangle for the CAPTURE that > > > > > > > is the same as the OUTPUT crop rectangle? I.e. the CAPTURE format contains > > > > > > > the adjusted width and height and the compose rectangle (read-only) contains > > > > > > > the visible width and height. It's not strictly necessary, but it is > > > > > > > symmetrical. > > > > > > > > > > > > Wouldn't it rather be the CAPTURE crop rectangle that would be of the > > > > > > same resolution of the OUTPUT compose rectangle? Then you could > > > > > > actually have the CAPTURE compose rectangle for putting that into the > > > > > > desired rectangle of the encoded stream, if the encoder supports that. > > > > > > (I don't know any that does, so probably out of concern for now.) > > > > > > > > > > Yes, you are right. > > > > > > > > > > But should we support this? > > > > > > > > > > I actually think not for this initial version. It can be added later, I guess. > > > > > > > > > > > > > I think it boils down on whether adding it later wouldn't > > > > significantly complicate the application logic. It also relates to my > > > > other comment somewhere below. > > > > > > > > > > > 2) the CAPTURE format will be updated as well with the new OUTPUT width and > > > > > > > height. The CAPTURE sizeimage might change as well. > > > > > > > > > > > > > > > > + > > > > > > > > > +4. **Optional.** Set the visible resolution for the stream metadata via > > > > > > > > > + :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue. > > > > > > > > > > > > > > I think you should mention that this is only necessary if the crop rectangle > > > > > > > that is set when you set the format isn't what you want. > > > > > > > > > > > > > > > > > > > Ack. > > > > > > > > > > > > > > > + > > > > > > > > > + * **Required fields:** > > > > > > > > > + > > > > > > > > > + ``type`` > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` > > > > > > > > > + > > > > > > > > > + ``target`` > > > > > > > > > + set to ``V4L2_SEL_TGT_CROP`` > > > > > > > > > + > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height`` > > > > > > > > > + visible rectangle; this must fit within the `V4L2_SEL_TGT_CROP_BOUNDS` > > > > > > > > > + rectangle and may be subject to adjustment to match codec and > > > > > > > > > + hardware constraints > > > > > > > > > + > > > > > > > > > + * **Return fields:** > > > > > > > > > + > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height`` > > > > > > > > > + visible rectangle adjusted by the encoder > > > > > > > > > + > > > > > > > > > + * The following selection targets are supported on ``OUTPUT``: > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP_BOUNDS`` > > > > > > > > > + equal to the full source frame, matching the active ``OUTPUT`` > > > > > > > > > + format > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP_DEFAULT`` > > > > > > > > > + equal to ``V4L2_SEL_TGT_CROP_BOUNDS`` > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_CROP`` > > > > > > > > > + rectangle within the source buffer to be encoded into the > > > > > > > > > + ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT`` > > > > > > > > > + > > > > > > > > > + .. note:: > > > > > > > > > + > > > > > > > > > + A common use case for this selection target is encoding a source > > > > > > > > > + video with a resolution that is not a multiple of a macroblock, > > > > > > > > > + e.g. the common 1920x1080 resolution may require the source > > > > > > > > > + buffers to be aligned to 1920x1088 for codecs with 16x16 macroblock > > > > > > > > > + size. To avoid encoding the padding, the client needs to explicitly > > > > > > > > > + configure this selection target to 1920x1080. > > > > > > > > > > > > > > This last sentence contradicts the proposed behavior of S_FMT(OUTPUT). > > > > > > > > > > > > > > > > > > > Sorry, which part exactly and what part of the proposal exactly? :) > > > > > > (My comment above might be related, though.) > > > > > > > > > > Ignore my comment. We go back to explicitly requiring userspace to set the OUTPUT > > > > > crop selection target, so this note remains valid. > > > > > > > > > > > > > Ack. > > > > > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_BOUNDS`` > > > > > > > > > + maximum rectangle within the coded resolution, which the cropped > > > > > > > > > + source frame can be composed into; if the hardware does not support > > > > > > > > > + composition or scaling, then this is always equal to the rectangle of > > > > > > > > > + width and height matching ``V4L2_SEL_TGT_CROP`` and located at (0, 0) > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT`` > > > > > > > > > + equal to a rectangle of width and height matching > > > > > > > > > + ``V4L2_SEL_TGT_CROP`` and located at (0, 0) > > > > > > > > > + > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE`` > > > > > > > > > + rectangle within the coded frame, which the cropped source frame > > > > > > > > > + is to be composed into; defaults to > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without > > > > > > > > > + additional compose/scaling capabilities; resulting stream will > > > > > > > > > + have this rectangle encoded as the visible rectangle in its > > > > > > > > > + metadata > > > > > > > > > > > > > > I think the compose targets for OUTPUT are only needed if the hardware can > > > > > > > actually do scaling and/or composition. Otherwise they can (must?) be > > > > > > > dropped. > > > > > > > > > > > > > > > > > > > Note that V4L2_SEL_TGT_COMPOSE is defined to be the way for the > > > > > > userspace to learn the target visible rectangle that's going to be > > > > > > encoded in the stream metadata. If we omit it, we wouldn't have a way > > > > > > that would be consistent between encoders that can do > > > > > > scaling/composition and those that can't. > > > > > > > > > > I'm not convinced about this. The standard API behavior is not to expose > > > > > functionality that the hardware can't do. So if scaling isn't possible on > > > > > the OUTPUT side, then it shouldn't expose OUTPUT compose rectangles. > > > > > > > > > > I also believe it very unlikely that we'll see encoders capable of scaling > > > > > as it doesn't make much sense. > > > > > > > > It does make a lot of sense - WebRTC requires 3 different sizes of the > > > > stream to be encoded at the same time. However, unfortunately, I > > > > haven't yet seen an encoder capable of doing so. > > > > > > > > > I would prefer to drop this to simplify the > > > > > spec, and when we get encoders that can scale, then we can add support for > > > > > compose rectangles (and I'm sure we'll need to think about how that > > > > > influences the CAPTURE side as well). > > > > > > > > > > For encoders without scaling it is the OUTPUT crop rectangle that defines > > > > > the visible rectangle. > > > > > > > > > > > However, with your proposal of actually having selection rectangles > > > > > > for the CAPTURE queue, it could be solved indeed. The OUTPUT queue > > > > > > would expose a varying set of rectangles, depending on the hardware > > > > > > capability, while the CAPTURE queue would always expose its rectangle > > > > > > with that information. > > > > > > > > > > I think we should keep it simple and only define selection rectangles > > > > > when really needed. > > > > > > > > > > So encoders support CROP on the OUTPUT, and decoders support CAPTURE > > > > > COMPOSE (may be read-only). Nothing else. > > > > > > > > > > Once support for scaling is needed (either on the encoder or decoder > > > > > side), then the spec should be enhanced. But I prefer to postpone that > > > > > until we actually have hardware that needs this. > > > > > > > > > > > > > Okay, let's do it this way then. Actually, I don't even think there is > > > > much value in exposing information internal to the bitstream metadata > > > > like this, similarly to the coded size. My intention was to just > > > > ensure that we can easily add scaling/composing functionality later. > > > > > > > > I just removed the COMPOSE rectangles from my next draft. > > > > > > I don't think that supporting scaling will be a problem for the API as > > > such, since this is supported for standard video capture devices. It > > > just gets very complicated trying to describe how to configure all this. > > > > > > So I prefer to avoid this until we need to. > > > > > > > [snip] > > > > > > > Changing the OUTPUT format will always fail if OUTPUT buffers are already allocated, > > > > > > > or if changing the OUTPUT format would change the CAPTURE format (sizeimage in > > > > > > > particular) and CAPTURE buffers were already allocated and are too small. > > > > > > > > > > > > The OUTPUT format must not change the CAPTURE format by definition. > > > > > > Otherwise we end up in a situation where we can't commit, because both > > > > > > queue formats can affect each other. Any change to the OUTPUT format > > > > > > that wouldn't work with the current CAPTURE format should be adjusted > > > > > > by the driver to match the current CAPTURE format. > > > > > > > > > > But the CAPTURE format *does* depend on the OUTPUT format: if the output > > > > > resolution changes, then so does the CAPTURE resolution and esp. the > > > > > sizeimage value, since that is typically resolution dependent. > > > > > > > > > > The coda driver does this as well: changing the output resolution > > > > > will update the capture resolution and sizeimage. The vicodec driver does the > > > > > same. > > > > > > > > > > Setting the CAPTURE format basically just selects the codec to use, after > > > > > that you can set the OUTPUT format and read the updated CAPTURE format to > > > > > get the new sizeimage value. In fact, setting the CAPTURE format shouldn't > > > > > change the OUTPUT format, unless the OUTPUT format is incompatible with the > > > > > newly selected codec. > > > > > > > > Let me think about it for a while. > > > > > > Sleep on it, always works well for me :-) > > > > Okay, I think I'm not convinced. > > > > I believe we decided to allow sizeimage to be specified by the > > application, because it knows more about the stream it's going to > > encode. Only setting the size to 0 would make the encoder fall back to > > some simple internal heuristic. > > Yes, that was the plan, but the patch stalled. I completely forgot > about this patch :-) > > My last reply to "Re: [RFC PATCH] media/doc: Allow sizeimage to be set by > v4l clients" was March 14th. > > Also, sizeimage must be at least the minimum size required for the given > CAPTURE width and height. So if it is less, then sizeimage will be set to that > minimum size. > > > Another thing is handling resolution changes. I believe that would > > have to be handled by stopping the OUTPUT queue, changing the OUTPUT > > format and starting the OUTPUT queue, all that without stopping the > > CAPTURE queue. With the behavior you described it wouldn't work, > > because the OUTPUT format couldn't be changed. > > > > I'd suggest making OUTPUT format changes not change the CAPTURE sizeimage. > > So OUTPUT format changes will still update the CAPTURE width and height? > > It's kind of weird if you are encoding e.g. 1920x1080 but the CAPTURE format > says 1280x720. I'm not sure what is best. > > What if the CAPTURE sizeimage is too small for the new OUTPUT resolution? > Should S_FMT(OUTPUT) fail with some error in that case? Sounds like we need something similar to the SOURCE_CHANGE event mechanism if we want to allow dynamic bitrate control which would require re-allocation of the capture buffer queue. (Or any other runtime control on our encoders, which is really expected to be supported these days). > > Regards, > > Hans > > > Best regards, > > Tomasz > >
Attachment:
signature.asc
Description: This is a digitally signed message part