Re: [PATCH 3/5] docs: uapi: media: Add common documentation of tiled NV15

Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx> · Wed, 13 Sep 2023 14:11:32 -0400

Le lundi 07 août 2023 à 13:37 +0200, Andrzej Pietrasiewicz a écrit :
> Hi Nicolas,
> 
> W dniu 4.08.2023 o 21:27, Nicolas Dufresne pisze:
> > This way we don't have to repeat over and over how the pixels are
> > packed in NV15.
> > 
> > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@xxxxxxxxxxxxx>
> > ---
> >   .../media/v4l/pixfmt-yuv-planar.rst           | 79 ++++++++++++++++---
> >   1 file changed, 68 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
> > index 1d43532095c0..052927bd9396 100644
> > --- a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
> > +++ b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
> > @@ -373,10 +373,74 @@ two non-contiguous planes.
> >   Tiled NV15
> >   ----------
> >   
> > -``V4L2_PIX_FMT_NV15_4L4`` Semi-planar 10-bit YUV 4:2:0 formats, using 4x4 tiling.
> > -All components are packed without any padding between each other.
> > -As a side-effect, each group of 4 components are stored over 5 bytes
> > -(YYYY or UVUV = 4 * 10 bits = 40 bits = 5 bytes).
> > +Semi-planar 10-bit YUV 4:2:0 formats. All components are packed
> > +without any padding between each other. Each pixels occupy 15 bits
> 
> Maybe "Each pixel group"?

ack.

> 
> 
> 
> > +and are usually stored in group of 4 components stored over 5 bytes
> > +(YYYY or UVUV = 4 * 10 bits = 40 bits = 5 bytes) or partitioned into
> > +upper 8 bit and lower 2 bits.
> > +
> > +.. flat-table:: Sample of 4 NV15 luma pixels
> > +    :header-rows:  2
> > +    :stub-columns: 0
> > +
> > +    * -
> > +      - 8
> > +      - 7
> > +      - 6
> > +      - 5
> > +      - 4
> > +      - 3
> > +      - 2
> > +      - 1
> > +      - 0
> > +    * - byte 0
> > +      - Y'\ :sub:`0:0`
> > +      - Y'\ :sub:`0:1`
> > +      - Y'\ :sub:`0:2`
> > +      - Y'\ :sub:`0:3`
> > +      - Y'\ :sub:`0:4`
> > +      - Y'\ :sub:`0:5`
> > +      - Y'\ :sub:`0:6`
> > +      - Y'\ :sub:`0:7`
> 
> So byte 0 contains Y0, bits 0..7 but then...
> 
> > +    * - byte 1
> > +      - Y'\ :sub:`0:8`
> > +      - Y'\ :sub:`0:9`
> > +      - Y'\ :sub:`1:0`
> > +      - Y'\ :sub:`1:1`
> > +      - Y'\ :sub:`1:2`
> > +      - Y'\ :sub:`1:3`
> > +      - Y'\ :sub:`1:4`
> > +      - Y'\ :sub:`1:5`
> > +    * - byte 2
> > +      - Y'\ :sub:`1:6`
> > +      - Y'\ :sub:`1:7`
> > +      - Y'\ :sub:`1:8`
> > +      - Y'\ :sub:`1:9`
> > +      - Y'\ :sub:`2:0`
> > +      - Y'\ :sub:`2:1`
> > +      - Y'\ :sub:`2:2`
> > +      - Y'\ :sub:`2:3`
> > +    * - byte 3
> > +      - Y'\ :sub:`2:4`
> > +      - Y'\ :sub:`2:5`
> > +      - Y'\ :sub:`2:6`
> > +      - Y'\ :sub:`2:7`
> > +      - Y'\ :sub:`2:8`
> > +      - Y'\ :sub:`2:9`
> > +      - Y'\ :sub:`3:0`
> > +      - Y'\ :sub:`3:1`
> > +    * - byte 4
> > +      - Y'\ :sub:`3:2`
> > +      - Y'\ :sub:`3:3`
> > +      - Y'\ :sub:`3:4`
> > +      - Y'\ :sub:`3:5`
> > +      - Y'\ :sub:`3:6`
> > +      - Y'\ :sub:`3:7`
> > +      - Y'\ :sub:`3:8`
> > +      - Y'\ :sub:`3:9`
> > +
> > +``V4L2_PIX_FMT_NV15_4L4`` stores pixels in 4x4 tiles, and stores tiles linearly
> > +in memory.
> >   
> >   ``V4L2_PIX_FMT_NV12M_10BE_8L128`` is similar to ``V4L2_PIX_FMT_NV12M`` but stores
> >   10 bits pixels in 2D 8x128 tiles, and stores tiles linearly in memory.
> > @@ -385,13 +449,6 @@ The image height must be aligned to a multiple of 128.
> >   The layouts of the luma and chroma planes are identical.
> >   Note the tile size is 8bytes multiplied by 128 bytes,
> >   it means that the low bits and high bits of one pixel may be in different tiles.
> > -The 10 bit pixels are packed, so 5 bytes contain 4 10-bit pixels layout like
> > -this (for luma):
> > -byte 0: Y0(bits 9-2)
> 
> ...here it says byts 9-2? Is it a mistake or are you cleaning up the doc
> and the table above is the correct version?

Thanks a lot for spotting. I did miss the endianess aspect and just assumed all
NV15 implementation was the same. So digging further, Hantro/RK version of NV15
is  using a little endian representation form. So you have in memory:

Byte 0: Y0 bits 7-0
Byte 1: Y1 bits 5-0 in MSB | Y0 bits 9-8 in LSB
Byte 3: Y2 bits 3-0 in MSB | Y1 bits 9-6 in LSB
Byte 4: Y3 bits 1-0 in MSB | Y2 bits 9-4 in LSB
Byte 5: Y3 bits 9-2

If we represent the reads in 16bits words (as an illustration), you'd read Y0
with:

Y0: 0x[Byte 1][Byte 0] & 0x3ff
Y1: 0x[Byte 2][Byte 1] >> 2 & 0x3ff
Y2: 0x[Byte 3][Byte 2] >> 4 & 0x3ff
Y3: 0x[Byte 4][Byte 3] >> 6

Which makes the 10 bits of data always adjacent (of course not that practical
for a CPU since its unaligned, but let's not bother ;-P). 

I can see now that Amphion is big endian, as the bytes get pushed into the MSB.
So with the originally documented bit placement we'd have:

Y0: 0x[Byte 0][Byte 1] >> 6
Y1: 0x[Byte 1][Byte 2] >> 4 & 0x3ff
Y2: 0x[Byte 2][Byte 3] >> 2 & 0x3ff
Y3: 0x[Byte 3][Byte 4] & 0x3ff

I'll drop the generalization here, and only introduce NV15 family as fully
packed 10 bit semi-planar formats, which often stores 4 pixel per 5 bytes, but
may partition lower bits (aka MT2110).

> 
> Regards,
> 
> Andrzej
> 
> > -byte 1: Y0(bits 1-0) Y1(bits 9-4)
> > -byte 2: Y1(bits 3-0) Y2(bits 9-6)
> > -byte 3: Y2(bits 5-0) Y3(bits 9-8)
> > -byte 4: Y3(bits 7-0)
> >   
> >   ``V4L2_PIX_FMT_NV12_10BE_8L128`` is similar to ``V4L2_PIX_FMT_NV12M_10BE_8L128`` but stores
> >   two planes in one memory.
>