Re: [PATCH] drm/fourcc: Add modifier definition for describing Verisilicon video framebuffer

Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx> · Sun, 10 Nov 2024 21:26:35 -0500

Hi,

I have an update on top of what I've said ealier.

Le mercredi 06 novembre 2024 à 14:49 -0500, Nicolas Dufresne a écrit :
> Hi,
> 
> Le mercredi 06 novembre 2024 à 16:53 +0100, Benjamin Gaignard a écrit :
> > + nicolas
> 
> Thanks for the CC, I'm obviously watching kernel@xxxxxxxxxxxxx, I don't know why
> it didn't make it to my mailbox.
> 
> I'm adding explicitly Lucas and Philipp, as I believe they can provide relevant
> information here.
> 
> > 
> > Le 06/11/2024 à 14:30, Benjamin Gaignard a écrit :
> > > Verisilicon hardware video decoders can produce tiled (8x4 or 4x4)
> > > and compressed video framebuffers.
> > > It considerably reduces memory bandwidth while writing and reading
> > > frames in memory.
> 
> I've seen for years this 8x4 references, but in reality, and I've implemented
> software converters that works on all the VSI/Hantro drivers we have mainline,
> there is no such thing as 8x4 tiled coming out of these chips.

I completely missed that tiled has never been an output format from the upstream
Hantro G1 driver, meaning it could possibly be 8x4, sorry for this.

That being said, it also means we have nothing upstream producing that, so I'd
avoid the risk of making up a bad definition.

> 
> Unless we have new evidence, and V4L2 patches enabling these formats, I don't
> see any point of bringing what I believe is a TRM mistake, or an historical
> format.
> 
> > > 
> > > The underlying storage in NV12 (for 8-bit) or NV15 (for 10-bit).
> > > 
> > > Display controllers, like imx DCSS, could use these modifier definition
> > > as input for overlay planes.
> > > 
> > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@xxxxxxxxxxxxx>
> > > ---
> > > The original code comes from:
> > > https://github.com/nxp-imx/linux-imx/commit/ab01b7fe82d5a11dfb533cfbd08c4dfa140815de
> > > I have port it and modify DRM_FORMAT_MOD_VENDOR_VSI value.
> > > 
> > >   include/uapi/drm/drm_fourcc.h | 27 +++++++++++++++++++++++++++
> > >   1 file changed, 27 insertions(+)
> > > 
> > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > > index 78abd819fd62..31d09a98d0d7 100644
> > > --- a/include/uapi/drm/drm_fourcc.h
> > > +++ b/include/uapi/drm/drm_fourcc.h
> > > @@ -421,6 +421,7 @@ extern "C" {
> > >   #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> > >   #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> > >   #define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> > > +#define DRM_FORMAT_MOD_VENDOR_VSI     0x0b
> > >   
> > >   /* add more to the end as needed */
> > >   
> > > @@ -1607,6 +1608,32 @@ drm_fourcc_canonicalize_nvidia_format_mod(__u64 modifier)
> > >   #define AMD_FMT_MOD_CLEAR(field) \
> > >   	(~((__u64)AMD_FMT_MOD_##field##_MASK << AMD_FMT_MOD_##field##_SHIFT))
> > >   
> > > +/* Verisilicon framebuffer modifiers */
> > > +
> > > +/*
> > > + * Verisilicon 8x4 tiling layout
> > > + *
> > > + * This is G1 VPU tiled layout using tiles of 8x4 pixels in a row-major
> > > + * layout.
> > > + */
> > > +#define DRM_FORMAT_MOD_VSI_G1_TILED fourcc_mod_code(VSI, 1)
> 
> I have code in GStreamer mainline that handle the tiled G1 output in software
> and its 4x4, no doubt here ... 

Appology again for my mistake, this can effectively be 8x4, a format we have no
producer for upstream.

> 
> > > +
> > > +/*
> > > + * Verisilicon 4x4 tiling layout
> > > + *
> > > + * This is G2 VPU tiled layout using tiles of 4x4 pixels in a row-major
> > > + * layout.
> > > + */
> > > +#define DRM_FORMAT_MOD_VSI_G2_TILED fourcc_mod_code(VSI, 2)
> 
> ... Meaning this split make no sense, G2 shares the same format in V4L2 and
> works well in GStreamer software converters. In fact, in the NXP TRM, you should
> notice that in the G1 section, they document G2 tile and G1 tile is ignored.
> Perhaps the DCSS do implement some 8x4 tiling, but considering we don't have
> evidence of anything producing that, we should probably not have it.
> 
> From the documentation from NXP, G2 Tile is identical to
> DRM_FORMAT_MOD_VIVANTE_TILED, so we really should reuse the existing definition.
> If not, I'd really like to know in the text why. Running Etnaviv patched with
> NV12 support (which I think got merged now), I can see that the combination is
> supported (but haven't tested it though since its not mapped properly in
> GStreamer yet):
> 
> 
> > NV12         | NV12                    | external only |
> >              | NV12:0x0600000000000001 | external only |
> >              | NV12:0x0600000000000002 | external only |
> 
> Having a GL path is very efficient, but also gives compositors a composition
> fallback which is needed to enable a format and guarantee it can be rendered at
> all time.

I wanted to confirm my claimed that VIVANTE_TILED is the same. So I worked on
this claim, and demonstrated that it is effectively the same by passing that
through the GPU. Here's my GStreamer modification to make this happen:

https://gitlab.freedesktop.org/ndufresne/gstreamer/-/commits/imx8m-nv12-4l4-gl

This has been tested with glimagesink and hantro G2 mainline decoder.

> 
> > > +
> > > +/*
> > > + * Verisilicon 4x4 tiling with compression layout
> > > + *
> > > + * This is G2 VPU tiled layout using tiles of 4x4 pixels in a row-major
> > > + * layout with compression.
> > > + */
> > > +#define DRM_FORMAT_MOD_VSI_G2_TILED_COMPRESSED fourcc_mod_code(VSI, 3)
> 
> This one needs definition, but the doc provided does not seem to match what I
> can see in IMX8M TRM (see 14.2.2.3 Compressed format), here's a snippet, I don't
> yet fully understand it. They only explain the storage, not the compression
> technique, but this should be enough to understand its not a simple row-major
> 4x4 tiling.
> 
>    When compression is enabled, the picture is divided into CBS rows, which is
>    further divided into continuous CBS groups. Each CBS group is composed of 16
>    CBS. The luminance CBS is composed of 1 8x8 coded block (CB), which one
>    chrominance CBS is composed of two 8x4 coded blocks, with cb CB first then cr
>    CB following. The compression is performed to each CB in the CBS. The first
>    CB's compressed data saved from the same offset of that CBS in raster scan
>    buffer. And the compressed output of following CBs in the CBS row is saved
>    continuously. That means there may be gaps between the compressed data of
>    each CBS row.

I will also raise the same question, if we have nothing upstream producing it,
shall we take the risk and upstream a definition for it ? We can clearly live
with just VIVANTE_TILED in the short term and wait until V4L2 drivers evolves.

> 
> Nicolas
> 
> > > +
> > >   #if defined(__cplusplus)
> > >   }
> > >   #endif
>