Re: [PATCH v2 25/25] drm/xlnx: Compute dumb-buffer sizes with drm_mode_size_dumb()

Sui Jingfeng <sui.jingfeng@xxxxxxxxx> · Mon, 20 Jan 2025 00:26:30 +0800

Hi,

On 2025/1/19 23:22, Tomi Valkeinen wrote:
On 19/01/2025 16:59, Sui Jingfeng wrote:

But userspace must be able to continue allocating YUV buffers through
CREATE_DUMB.

I think, allocating YUV buffers through CREATE_DUMB interface is just
an *abuse* and *misuse* of this API for now.

Take the NV12 format as an example, NV12 is YUV420 planar format, have
two planar: the Y-planar and the UV-planar. The Y-planar appear first
in memory as an array of unsigned char values. The Y-planar is 
followed
immediately by the UV-planar, which is also an array of unsigned char
values that contains packed U (Cb) and V (Cr) samples.

But the 'drm_mode_create_dumb' structure is only intend to provide
descriptions for *one* planar.

struct drm_mode_create_dumb {
     __u32 height;
     __u32 width;
     __u32 bpp;
     __u32 flags;
     __u32 handle;
     __u32 pitch;
     __u64 size;
};

An width x height NV12 image takes up width*height*(1 + 1/4 + 1/4) 
bytes.

So we can allocate an *equivalent* sized buffer to store the NV12 
raw data.

Either 'width * (height * 3/2)' where each pixel take up 8 bits,
or just 'with * height' where each pixels take up 12 bits.

However, all those math are just equivalents description to the 
original
NV12 format, neither are concrete correct physical description.

I don't see the problem. Allocating dumb buffers, if we don't have 
any heuristics related to RGB behind it, is essentially just 
allocating a specific amount of memory, defined by width, height and 
bitsperpixel.

I think, the problem will be that the 'width', 'height' and 'bpp'
are originally used to describe one plane. Those three parameters
has perfectly defined physical semantics.

But with multi planar formats, take NV12 image as an example,
for a 2×2 square of pixels, there are 4 Y samples but only 1 U
sample and 1 V sample. This format requires 4x8+1x8+1x8=48 bits
to store the 2x2 square.

So its depth is 12 bits per pixel (48 / (2 * 2)).

so my problem is that the mentioned 12bpp in this example only
make sense in mathematics, it doesn't has a good physical
interpret. Do you agree with me on this technique point?

If I want to create an NV12 framebuffer, I allocate two dumb 
buffers, one for Y and one for UV planes, and size them accordingly. 
And then create the DRM framebuffer with those.

Then how you fill the value of the 'width', 'height' and 'bpp' of 
each dumb buffers?

For 640x480-NV12:
plane 0: width = 640, height = 480, bpp = 8
plane 1: width = 640 / 2, height = 480 / 2, bpp = 16

But i think this should be hardware dependent. The hardware I'm using
load NV12  raw data as a whole. I only need to feed gpuva of the backing
memory to the hardware register once.

Not familiar with your hardware, so I can't talk more on this software
design. Perhaps someone know more could have a comment on this.

Why not allocate storage for the whole on one shoot?

You can, if you adjust the parameters accordingly. However, if the 
strides of the planes are not equal, I guess it might cause problems 
on some platforms.

But I think it's usually simpler to allocate one buffer per plane, and 
perhaps even better as it doesn't require as large contiguous memory 
area.

The modetest in libdrm can be an good example, send it[1] to you as 
an reference.

Right, so modetest already does it successfully. So... What is the issue?

But then, the problem will become that it override the 'height' parameter.
What's the physical interpretation of the 'height' parameter when creating
an NV12 image with the dump API then?

I guess, solving complex problems with simple APIs may see the limitation,
sooner or later. But I not very sure and might be wrong. So other peoples
can override me words.

Everyone agrees that CREATE_DUMB is not the best ioctl to allocate 
buffers, and one can't consider it to work identically across the 
platforms. But it's what we have and what has been used for ages.

Yeah, your request are not unreasonable. It can be seen as a kind of rigid demand.
Since GEM DMA helpers doesn't export an more advanced interface to userspace so far.
As a result, drivers that employing GEM DMA has no other choice, but to abuse the
dumb buffer API to do allocation for the more complex format buffers.

The dumb buffer API doesn't support to specify buffer format, tile status and
placement etc. The more advance drivers has been exposed the xxx_create_gem()
to user-space. It seems that a few more experienced programmers hint us to
create an new ioctl at above thread, so that we can keep employing simple API
to do simple things and to suit complex needs with the more advanced APIs.

 Tomi

--
Best regards,
Sui