Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format

Hsia-Jun Li <Randy.Li@xxxxxxxxxxxxx> · Tue, 23 Aug 2022 15:40:57 +0800

On 8/22/22 22:15, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le samedi 20 août 2022 à 08:10 +0800, Hsia-Jun Li a écrit :

On 8/20/22 03:17, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :

On 8/19/22 23:28, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
On 8/18/22 14:06, Tomasz Figa wrote:
On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li <randy.li@xxxxxxxxxxxxx> wrote:

From: "Hsia-Jun(Randy) Li" <randy.li@xxxxxxxxxxxxx>

The most of detail has been written in the drm.

This patch still needs a description of the format, which should go to
Documentation/userspace-api/media/v4l/.

Please notice that the tiled formats here request
one more plane for storing the motion vector metadata.
This buffer won't be compressed, so you can't append
it to luma or chroma plane.

Does the motion vector buffer need to be exposed to userspace? Is the
decoder stateless (requires userspace to specify the reference frames)
or stateful (manages the entire decoding process internally)?

No, users don't need to access them at all. Just they need a different
dma-heap.

You would only get the stateful version of both encoder and decoder.

Shouldn't the motion vectors be stored in a separate V4L2 buffer,
submitted through a different queue then ?

Imho, I believe these should be invisible to users and pooled separately to
reduce the overhead. The number of reference is usually lower then the number of
allocated display buffers.

You can't. The motion vector buffer can't share with the luma and chroma
data planes, nor the data plane for the compression meta data.

You could consider this as a security requirement(the memory region for
the MV could only be accessed by the decoder) or hardware limitation.

It is also not very easy to manage such a large buffer that would change
when the resolution changed.

Your argument are just aiming toward the fact that you should not let the user
allocate these in the first place. They should not be bound to the v4l2 buffer.
Allocate these in your driver, and leave to your user the pixel buffer (and
compress meta) allocation work.

What I want to say is that userspace could allocate buffers then make
the v4l2 decoder import these buffers, but each planes should come from
the right DMA-heaps. Usually the userspace would know better the memory
occupation, it would bring some flexibility here.

Currently, they are another thing bothers me, I need to allocate a small
piece of memory(less than 128KiB) as the compression metadata buffers as
I mentioned here. And these pieces of memory should be located in a
small region, or the performance could be badly hurt, besides, we don't
support IOMMU for this kind of data.

Any idea about assign a small piece of memory from a pre-allocated
memory or select region(I don't think I could reserve them in a
DMA-heap) for a plane in the MMAP type buffer ?

A V4L2 driver should first implement the V4L2 semantic before adding optional
use case like buffer importation. For this reason, your V4L2 driver should know
all the memory requirements and how to allocate that memory. 
Yes, that is what I intend to. Or I just smuggle those things somewhere.
Later on, your
importing driver will have to validate that the userland did it right at
importation. This is to follow V4L2 semantic and security model. If you move
simply trust the userland (gralloc), you are not doing it right.

Yes, that is what I try to describe in the other thread
https://lore.kernel.org/linux-media/B4B3306F-C3B4-4594-BDF9-4BBC59C628C9@xxxxxxxxxxx/
I don't have the problem that let the userspace decided where and how to 
allocate the memory, but we need a new protocol here to let the 
userspace do it right.

Besides, I am not very satisfied with the dynamic resolution change
steps if I understand it correct. Buffers reallocation should happen
when we receive the event not until the drain is done. A resolution
rising is very common when you are playing a network stream, it would be
better that the decoder decided how many buffers it need for the
previous sequence while the userspace could reallocate the reset of
buffers in the CAPTURE queue.
Other driver handle this just fine, if your v4l2 driver implement the v4l2
resolution change mechanism, is should be very simple to manage.

This is a limitation of the queue design of V4L2. While streaming the buffers
associated with the queue must currently be large enough to support the selected
format. "large enough" in your case is complex, and validation must be
programmed in your driver.

There was discussion on perhaps extending on CREATE_BUFS feature, that let you
allocate at run-time for a different format/resolution (no drivers currently
allow that). The rules around that aren't specified (and will have to be defined
before a driver starts making use of that). Note that to be usable, a
DELETE_BUF(s) ioctl would probably be needed too.

In current state, If your driver can support it, userland does not strictly need
to re-allocate if the resolution is changed to smaller. In most SVC scenarios,
the largest resolution is known in advance, so pre-allocation can happen to the
When you play a video from Youtube, you may notice that starting 
resolution is low, then after it received more data knowning the 
bandwidth is enough, it would switch to a higher resolution. I don't 
think it would inform the codecs2 or OMX there is a higher target 
resolution.

Besides, for the case of SVC in a conference system, the remote(gatway) 
would not tell you there is a higer resolution or frame rate because you 
can't receive it in negotiate stage, it could be permanently(device 
capability) or just bandwidth problem. Whether we know there is a higher 
requirement video depends on the transport protocols used here.

The basic idea of SVC is that the low layer didn't depends on the upper 
layer, we can't tell how the bitstream usually.
appropriate resolution and queue size. Re-allocation is then rarely triggered at
run time. Unlike your system, IOMMU system are not as affected by allocation
latency and manages to do gapless transition despite this inefficiency.

Note that all this is pure recommendation. What I'm seeing here is a pixel
format documented with Android assumptions rather then mainline, and sent
without the associated implementation. This simply raises some question on the
Because this implementation is very complex as you could see now, I 
didn't see the exist implementation here could decode DRM video or has 
the security restriction here.

And you see even before this decoder driver is done, we have had enough 
problems, even just the definition of pixel formats and data exchange 
mechanism.
viability of the whole. This is not a critic but just some verification that
ensure you are following the V4L2 spec.
I really want to those recommendations here, I want to make everything 
right at the first place. Or we would make a driver we would know it 
won't follow the v4l2 spec.



Signed-off-by: Hsia-Jun(Randy) Li <randy.li@xxxxxxxxxxxxx>
---
     drivers/media/v4l2-core/v4l2-common.c | 1 +
     drivers/media/v4l2-core/v4l2-ioctl.c  | 2 ++
     include/uapi/linux/videodev2.h        | 2 ++
     3 files changed, 5 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-common.c b/drivers/media/v4l2-core/v4l2-common.c
index e0fbe6ba4b6c..f645278b3055 100644
--- a/drivers/media/v4l2-core/v4l2-common.c
+++ b/drivers/media/v4l2-core/v4l2-common.c
@@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
                    { .format = V4L2_PIX_FMT_SGBRG12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
                    { .format = V4L2_PIX_FMT_SGRBG12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
                    { .format = V4L2_PIX_FMT_SRGGB12,       .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
+               { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
            };
            unsigned int i;

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index e6fd355a2e92..8f65964aff08 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
                    case V4L2_PIX_FMT_MT21C:        descr = "Mediatek Compressed Format"; break;
                    case V4L2_PIX_FMT_QC08C:        descr = "QCOM Compressed 8-bit Format"; break;
                    case V4L2_PIX_FMT_QC10C:        descr = "QCOM Compressed 10-bit Format"; break;
+               case V4L2_PIX_FMT_NV12M_V4H1C:  descr = "Synaptics Compressed 8-bit tiled Format";break;
+               case V4L2_PIX_FMT_NV12M_10_V4H3P8C:     descr = "Synaptics Compressed 10-bit tiled Format";break;
                    default:
                            if (fmt->description[0])
                                    return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..7e928cb69e7c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -661,6 +661,8 @@ struct v4l2_pix_format {
     #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12  Y/CbCr 4:2:0 16x16 tiles */
     #define V4L2_PIX_FMT_NV12M_8L128      v4l2_fourcc('N', 'A', '1', '2') /* Y/CbCr 4:2:0 8x128 tiles */
     #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* Y/CbCr 4:2:0 10-bit 8x128 tiles */
+#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2')   /* 12  Y/CbCr 4:2:0 tiles */
+#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0')   /* 12  Y/CbCr 4:2:0 10-bits tiles */

     /* Bayer formats - see https://urldefense.proofpoint.com/v2/url?u=http-3A__www.siliconimaging.com_RGB-2520Bayer.htm&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=lkQiuhx0yMAYHGcW-0WaHlF3e2etMHsu-FoNIBdZILGH6FPigwSAmel2vAdcVLkp&s=JKsBzpb_3u9xv52MaMuT4U3T1pPqcObYkpHDBxvcx_4&e=   */
     #define V4L2_PIX_FMT_SBGGR8  v4l2_fourcc('B', 'A', '8', '1') /*  8  BGBG.. GRGR.. */







--
Hsia-Jun(Randy) Li