Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

Nicolas Dufresne <nicolas@xxxxxxxxxxxx> · Tue, 23 Aug 2022 10:02:56 -0400

Le mardi 23 août 2022 à 15:03 +0800, Hsia-Jun Li a écrit :
> 
> On 8/23/22 14:05, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > On Sat, Aug 20, 2022 at 12:44 AM Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> wrote:
> > > 
> > > 
> > > 
> > > On 8/19/22 23:28, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > > 
> > > > 
> > > > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> > > > > On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > > > > > On 8/18/22 14:06, Tomasz Figa wrote:
> > > > > > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li <randy.li@xxxxxxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > From: "Hsia-Jun(Randy) Li" <randy.li@xxxxxxxxxxxxx>
> > > > > > > > 
> > > > > > > > The most of detail has been written in the drm.
> > > > > 
> > > > > This patch still needs a description of the format, which should go to
> > > > > Documentation/userspace-api/media/v4l/.
> > > > > 
> > > > > > > > Please notice that the tiled formats here request
> > > > > > > > one more plane for storing the motion vector metadata.
> > > > > > > > This buffer won't be compressed, so you can't append
> > > > > > > > it to luma or chroma plane.
> > > > > > > 
> > > > > > > Does the motion vector buffer need to be exposed to userspace? Is the
> > > > > > > decoder stateless (requires userspace to specify the reference frames)
> > > > > > > or stateful (manages the entire decoding process internally)?
> > > > > > 
> > > > > > No, users don't need to access them at all. Just they need a different
> > > > > > dma-heap.
> > > > > > 
> > > > > > You would only get the stateful version of both encoder and decoder.
> > > > > 
> > > > > Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> > > > > submitted through a different queue then ?
> > > > 
> > > > Imho, I believe these should be invisible to users and pooled separately to
> > > > reduce the overhead. The number of reference is usually lower then the number of
> > > > allocated display buffers.
> > > > 
> > > You can't. The motion vector buffer can't share with the luma and chroma
> > > data planes, nor the data plane for the compression meta data.
> > 
> > I believe what Nicolas is suggesting is to just keep the MV buffer
> > handling completely separate from video buffers. Just keep a map
> > between frame buffer and MV buffer in the driver and use the right
> > buffer when triggering a decode.
> > 
> > > 
> > > You could consider this as a security requirement(the memory region for
> > > the MV could only be accessed by the decoder) or hardware limitation.
> > > 
> > > It is also not very easy to manage such a large buffer that would change
> > > when the resolution changed.
> > 
> > How does it differ from managing additional planes of video buffers?
> I should say I am not against his suggestion if I could make a DMA-heap 
> v4l2 allocator merge into kernel in the future. Although I think we need 
> two heaps here one for the normal video and one for the secure video, I 
> don't have much idea on how to determine whether we are decoding a 
> secure or non-secure video here (The design here is that the kernel 
> didn't know, only hardware and TEE care about that).

Its always nice when "the design" get discussed upstream, so we can raise any
known issues and improve it. Here, not knowing if we are handling secure or non-
secure memory in kernel driver would indeed require external allocation for
everything, and V4L2 does not currently work like this. There is a few use cases
(not all of them might apply to your driver, but they exists).

1. Secondary buffers

When a CODEC is combined with a post-processor, the driver is then responsible
for reference frame allocation. In both known secure memory approach (NXP secure
bit and secondary mmu), the driver must know, as it won't be allowed produce any
non-secure buffer while secure (and vis-versa). It would be very difficult to
make secondary buffers externally allocated, since the fact secondary buffers
are used is no known by userspace. You slightly mention about adding a new queue
type, this seems like an option, though one will have to figure-out how to make
this work in a backward compatible manner.

2. Internally managed feedback buffers

Existing case of feedback buffers is VP9 decoders. I initially thought that
would only be a challenge for stateless decoders, but it turns out that Amlogic
stateful drivers also needs to take care. In VP9, the bitstream is further
compressed using probability obtained through decoding. Those probability can be
further tuned with updates placed in the compressed header. In Amlogic and
existing VP9 stateless decoder, the merging of the feedback and compressed
header updates is done using the CPU, hence that feedback buffer cannot be
secure. With lets say NXP secure domain HW, this is impossible. The OPT-TEE
needs to be involved to abstract the programming of the HW and copy back the
secure buffers to non-secure, making sure it is not being tricked into
delivering a copy of the wrong data. For the MMU approach, no copy is needed,
but to be sure the memory being mapped into the Linux Kernel MMU is the right
one, some level of abstraction of the CODEC is needed.

In short, you need a mix of secure and non-secure memory. This is a huge
challenge that isn't well covered by any secure memory design at the moment, its
not even clear if the HW can work. Remember that these feedback buffers are not
exposed to userspace, hence cannot be allocated there. Recent discussion shows
that NXP might be just giving up on their stateless codec so they can solve this
with a full codec abstraction (stateful codec).

Feedback buffers also exist in stateless encoders, but we don't have yet
existing drivers for that. Encoders also have to deal with secure memory,
notably when encoding from HDCP enabled HDMI receivers. Though this task is
quite likely limited to dedicated system, which can be considered secure as a
whole, time will define this.

> 
> Just one place that I think it would be more simple for me to manage the 
> buffer here. When the decoder goes to the drain stage, then the MV 
> buffer goes when the data buffer goes and create when the data buffer 
> creates.
> I know that is not a lot of work to doing the mapping between them. I 
> just need to convince the other accepting that do not allocator the MV 
> buffer outside.

Its also a big memory saver if you manage to convince them.

> > 
> > Best regards,
> > Tomasz
> > 
> > > > > 
> > > > > > > > Signed-off-by: Hsia-Jun(Randy) Li <randy.li@xxxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > > >     drivers/media/v4l2-core/v4l2-common.c | 1 +
> > > > > > > >     drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
> > > > > > > >     include/uapi/linux/videodev2.h        | 2 ++
> > > > > > > >     3 files changed, 5 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-common.c b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > index e0fbe6ba4b6c..f645278b3055 100644
> > > > > > > > --- a/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > +++ b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
> > > > > > > >                    { .format = V4L2_PIX_FMT_SGBRG12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > >                    { .format = V4L2_PIX_FMT_SGRBG12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > >                    { .format = V4L2_PIX_FMT_SRGGB12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > > +               { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
> > > > > > > >            };
> > > > > > > >            unsigned int i;
> > > > > > > > 
> > > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > index e6fd355a2e92..8f65964aff08 100644
> > > > > > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > @@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > > > > > >                    case V4L2_PIX_FMT_MT21C:        descr = "Mediatek Compressed Format"; break;
> > > > > > > >                    case V4L2_PIX_FMT_QC08C:        descr = "QCOM Compressed 8-bit Format"; break;
> > > > > > > >                    case V4L2_PIX_FMT_QC10C:        descr = "QCOM Compressed 10-bit Format"; break;
> > > > > > > > +               case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 8-bit tiled Format";break;
> > > > > > > > +               case V4L2_PIX_FMT_NV12M_10_V4H3P8C:     descr = "Synaptics Compressed 10-bit tiled Format";break;
> > > > > > > >                    default:
> > > > > > > >                            if (fmt->description[0])
> > > > > > > >                                    return;
> > > > > > > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > > > > > > index 01e630f2ec78..7e928cb69e7c 100644
> > > > > > > > --- a/include/uapi/linux/videodev2.h
> > > > > > > > +++ b/include/uapi/linux/videodev2.h
> > > > > > > > @@ -661,6 +661,8 @@ struct v4l2_pix_format {
> > > > > > > >     #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  Y/CbCr 4:2:0 16x16 tiles */
> > > > > > > >     #define V4L2_PIX_FMT_NV12M_8L128      v4l2_fourcc('N', 'A', '1', '2') /* Y/CbCr 4:2:0 8x128 tiles */
> > > > > > > >     #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* Y/CbCr 4:2:0 10-bit 8x128 tiles */
> > > > > > > > +#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  Y/CbCr 4:2:0 tiles */
> > > > > > > > +#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   /* 12  Y/CbCr 4:2:0 10-bits tiles */
> > > > > > > > 
> > > > > > > >     /* Bayer formats - see https://urldefense.proofpoint.com/v2/url?u=http-3A__www.siliconimaging.com_RGB-2520Bayer.htm&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=lkQiuhx0yMAYHGcW-0WaHlF3e2etMHsu-FoNIBdZILGH6FPigwSAmel2vAdcVLkp&s=JKsBzpb_3u9xv52MaMuT4U3T1pPqcObYkpHDBxvcx_4&e=   */
> > > > > > > >     #define V4L2_PIX_FMT_SBGGR8  v4l2_fourcc('B', 'A', '8', '1') /*  8  BGBG.. GRGR.. */
> > > > > 
> > > > 
> > > 
> > > --
> > > Hsia-Jun(Randy) Li
>