Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

Nicolas Dufresne <nicolas@xxxxxxxxxxxx> · Thu, 15 Dec 2022 15:18:10 -0500

Le jeudi 08 décembre 2022 à 10:39 +0800, Hsia-Jun Li a écrit :
> 
> On 12/8/22 01:16, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :
> > > 
> > > On 12/7/22 02:03, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > > 
> > > > 
> > > > Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
> > > > > Hello
> > > > > 
> > > > > I think we need to add an extra event for VP9 and AV1 which support
> > > > > frame scaling, which means its frame width and height could be different
> > > > > to the previous frame or reference frame.
> > > > > 
> > > > > That would be more possible for the VP9 as there is not a sequence
> > > > > header for VP9.
> > > > 
> > > > The solution is unlikely in the form of an event, but yes, to complete VP9
> > > > support (and improve AV1 support) a mechanism need to be designed and specified
> > > > to handle inter-frame resolution changes.
> > > > 
> > > > Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
> > > > streams (the most common use of inter-frame resolution changes). With SVC
> > > > streams, the smaller images are alway decode-only (never displayed). This can be
> 
> We expect to get all the result from different layers of a SVC stream. 
> Which layer would be displayed is a user's decision.

This is off reality. Its the conferencing server that measure each participant
bandwidth and decide how many layers each one should get to avoid overloading
the network. The selection happens in compressed domain. Perhaps someone could
be creative and make-up a use case for what you describe, but this isn't used in
practice. Adding multi-resolution output requires a massive API additions in
stateful decoders (which this hidden RFC does not cover).

> 
> 1. golden frame usually would be higher resolution or better quality. 
> But we would only display frames which are lower resolution.
> 
> 2. Higher resolution or quality layer would have longer interval, user 
> may just display a lower layer in real time case (like video conference).
> 
> > > > at least partially supported as long as the maximum image dimension is signalled
> > > > by the bitstream. This is the case for AV1, but not VP9.
> > > > 
> > > > Stateless decoders are not affected, because userspace is aware of frames being
> > > > decoded, but not displayed. It is also aware that these frames are reference
> > > > frames. While on stateless decoder, userspace usually does not have this
> > > > knowledge. I think one way to solve this, would be for drivers to be able to
> > > > mark a buffer done, with a flag telling userspace that its not to be displayed.
> > > > For the SVC case, the dimensions and stride are irrelevant.
> > > > 
> > > > For true inter-resolution changes, like VP9 supports (though rarely used), this
> > > > needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
> > > > with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
> > > > handle the allocation transition.
> > > This could only solve the problem of never display graphics buffers
> > > likes golden frame or alternative reference frame.
> > > 
> > > About the topic timestamp tracking problem in v4l2, maybe we could start
> > > a new thread or move them to Gstreamer.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_gstreamer_gstreamer_-2D_issues_1619&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=KbTA0L42trYmxvVphaiUeDsgUS4e-vm64epfaSgAYH8&e=
> > > 
> > > My idea here is event attached to buffer or just using the new request
> > > supported in CAPTURE side. I know you worry about the v4l2 event, it is
> > > out of band, more event could lead to the problem we suffer from
> > > OpenMAX. If we could have an order between event and buffer, it won't be
> > > a problem.
> > > For VP9 also, it might be required to support
> > > > super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
> > > > frame of a super-frame is every to be displayed. A newly introduced AV1 format
> > > > might also requires complete TU, rather then frames, this needs strict
> > > > documentation.
> > > I don't think the temporal unit is a good idea here.
> > > Most of hardware could only decode a frame once or less likes a
> > > tile(likes slice in ITU codecs).
> > > 
> > > Considering the MPEG-TS case,
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__aomediacodec.github.io_av1-2Dmpeg2-2Dts_&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=f7qIYhe94ooCIv1awccCfSmI-Gq0raAXHRogkCBTB4M&e=
> > > Decodable Frame Group could be more a better idea.
> > > Temporal Unit would lead to larger delay.
> > 
> > This is off topic for the tread, but this one could be fixed by setting a flag
> > on the capture buffer, something like:
> > 
> >    V4L2_BUF_FLAG_DECODE_ONLY
> > 
> > That's similar to how it works with other CODEC API. The down side is that the
> > driver needs to remember if this is a reference frame when userspace queue back
> > that decode-only frame, so its not overwritten. Userspace is not aware of the
> > reference state, hence can't be made responsible. I suspect a lot of the drivers
> > out there uses secondary buffer, meaning the reference are not the CAPTURE
> > buffer. 
> Drivers certainly could allocate its internal buffers. But I believe 
> Android won't like this idea. They would like you allocate it from 
> somewhere then import into driver.
> 
> Besides, when you decode a secure(DRM) stream, you won't want to leak 
> any data from it. While for those normal stream, you don't want occupt 
> that limitted amount secure memory zone. I would like to let the 
> userspace control the allocation of those internal buffers.
> 
> This use case needs to be thought thought too. Perhaps other driver uses
> > internally allocated memory whenever its about to produce a decode only, but
> > that seems to require some firmware feature that is likely uncommon. Please,
> > make your research, compare various drivers, and propose an API in the form of
> > an RFC so we can discuss that independently from this AV1 pixel format thread.
> My proposal for solving tracking the timestamp issue is making v4l2 
> event have order relevant to buffer.
> 
> It would come after I refresh the v4l2 pix format extend API.
> > 
> > > 
> > >    Decoding frames would mean that un-display and frame of different
> > > > sizes get delivered, and we don't have a method to communicate these frame
> > > > dimension and strides at the moment.
> > > > 
> > > > Nicolas
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > On 9/12/22 23:45, Nicolas Dufresne wrote:
> > > > > > Hi Shi,
> > > > > > 
> > > > > > thanks for the patches, check inline for some comments. Generally speaking, we
> > > > > > don't usually add formats ahead of time unless we have a good rationale to do
> > > > > > so. Should be expect a companion series against the amlogic decoder driver that
> > > > > > enables this ?
> > > > > > 
> > > > > > Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
> > > > > > > From: "hao.shi" <hao.shi@xxxxxxxxxxx>
> > > > > > > 
> > > > > > > Add AV1 compressed pixel format. It is the more common format.
> > > > > > > 
> > > > > > > Signed-off-by: Hao Shi <hao.shi@xxxxxxxxxxx>
> > > > > > > ---
> > > > > > >     .../userspace-api/media/v4l/pixfmt-compressed.rst        | 9 +++++++++
> > > > > > >     drivers/media/v4l2-core/v4l2-ioctl.c                     | 1 +
> > > > > > >     include/uapi/linux/videodev2.h                           | 1 +
> > > > > > >     3 files changed, 11 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > index 506dd3c98884..5bdeeebdf9f5 100644
> > > > > > > --- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > +++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
> > > > > > > @@ -232,6 +232,15 @@ Compressed Formats
> > > > > > >             Metadata associated with the frame to decode is required to be passed
> > > > > > >             through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
> > > > > > >             See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
> > > > > > > +    * .. _V4L2-PIX-FMT-AV1:
> > > > > > > +
> > > > > > > +      - ``V4L2_PIX_FMT_AV1``
> > > > > > > +      - 'AV1'
> > > > > > > +      - AV1 Access Unit. The decoder expects one Access Unit per buffer.
> > > > > > 
> > > > > > I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
> > > > > > ? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
> > > > > > ignore interlaced video).
> > > > > I think it should be a complete tile group obu. From the spec, we have
> > > > > the term 'frame'.
> > > > > 
> > > > > Currently, AV1 doesn't support interlace.
> > > > > > 
> > > > > > > +        The encoder generates one Access Unit per buffer. This format is
> > > > > > > +        adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
> > > > > > > +        open video coding format. It was developed as a successor to VP9
> > > > > > > +        by the Alliance for Open Media (AOMedia).
> > > > > > > 
> > > > > > >     .. raw:: latex
> > > > > > > 
> > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > index c314025d977e..fc0f43228546 100644
> > > > > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > @@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > > > > >                     case V4L2_PIX_FMT_MT21C:        descr = "Mediatek Compressed Format"; break;
> > > > > > >                     case V4L2_PIX_FMT_QC08C:        descr = "QCOM Compressed 8-bit Format"; break;
> > > > > > >                     case V4L2_PIX_FMT_QC10C:        descr = "QCOM Compressed 10-bit Format"; break;
> > > > > > > +         case V4L2_PIX_FMT_AV1:          descr = "AV1"; break;
> > > > > > >                     default:
> > > > > > >                             if (fmt->description[0])
> > > > > > >                                     return;
> > > > > > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > > > > > index 01e630f2ec78..c5ea9f38d807 100644
> > > > > > > --- a/include/uapi/linux/videodev2.h
> > > > > > > +++ b/include/uapi/linux/videodev2.h
> > > > > > > @@ -738,6 +738,7 @@ struct v4l2_pix_format {
> > > > > > >     #define V4L2_PIX_FMT_FWHT_STATELESS     v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
> > > > > > >     #define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
> > > > > > >     #define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
> > > > > > > +#define V4L2_PIX_FMT_AV1      v4l2_fourcc('A', 'V', '1', '0') /* AV1 */
> > > > > > > 
> > > > > > >     /*  Vendor-specific formats   */
> > > > > > >     #define V4L2_PIX_FMT_CPIA1    v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */
> > > > > > > 
> > > > > > > base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
>