Hi,
On 2025/1/20 04:14, Laurent Pinchart wrote:
On Mon, Jan 20, 2025 at 12:26:30AM +0800, Sui Jingfeng wrote:
On 2025/1/19 23:22, Tomi Valkeinen wrote:
On 19/01/2025 16:59, Sui Jingfeng wrote:
But userspace must be able to continue allocating YUV buffers through
CREATE_DUMB.
I think, allocating YUV buffers through CREATE_DUMB interface is just
an *abuse* and *misuse* of this API for now.
Take the NV12 format as an example, NV12 is YUV420 planar format, have
two planar: the Y-planar and the UV-planar. The Y-planar appear first
in memory as an array of unsigned char values. The Y-planar is followed
immediately by the UV-planar, which is also an array of unsigned char
values that contains packed U (Cb) and V (Cr) samples.
But the 'drm_mode_create_dumb' structure is only intend to provide
descriptions for *one* planar.
struct drm_mode_create_dumb {
__u32 height;
__u32 width;
__u32 bpp;
__u32 flags;
__u32 handle;
__u32 pitch;
__u64 size;
};
An width x height NV12 image takes up width*height*(1 + 1/4 + 1/4)
bytes.
So we can allocate an *equivalent* sized buffer to store the NV12
raw data.
Either 'width * (height * 3/2)' where each pixel take up 8 bits,
or just 'with * height' where each pixels take up 12 bits.
However, all those math are just equivalents description to the original
NV12 format, neither are concrete correct physical description.
I don't see the problem. Allocating dumb buffers, if we don't have
any heuristics related to RGB behind it, is essentially just
allocating a specific amount of memory, defined by width, height and
bitsperpixel.
I think, the problem will be that the 'width', 'height' and 'bpp'
are originally used to describe one plane. Those three parameters
has perfectly defined physical semantics.
But with multi planar formats, take NV12 image as an example,
for a 2×2 square of pixels, there are 4 Y samples but only 1 U
sample and 1 V sample. This format requires 4x8+1x8+1x8=48 bits
to store the 2x2 square.
So its depth is 12 bits per pixel (48 / (2 * 2)).
so my problem is that the mentioned 12bpp in this example only
make sense in mathematics, it doesn't has a good physical
interpret. Do you agree with me on this technique point?
If I want to create an NV12 framebuffer, I allocate two dumb
buffers, one for Y and one for UV planes, and size them accordingly.
And then create the DRM framebuffer with those.
Then how you fill the value of the 'width', 'height' and 'bpp' of
each dumb buffers?
For 640x480-NV12:
plane 0: width = 640, height = 480, bpp = 8
plane 1: width = 640 / 2, height = 480 / 2, bpp = 16
But i think this should be hardware dependent. The hardware I'm using
load NV12 raw data as a whole. I only need to feed gpuva of the backing
memory to the hardware register once.
Not familiar with your hardware, so I can't talk more on this software
design. Perhaps someone know more could have a comment on this.
Layout of planes in memory is just one hardware constraint, the same way
we have constraints on alignment and strides. Some devices require the
planes to be contiguous (likely with some alignment constraints), some
can work with planes being in discontiguous pieces of memory, and even
require them to be discontiguous and located in separate DRAM banks.
Right.
Why not allocate storage for the whole on one shoot?
You can, if you adjust the parameters accordingly. However, if the
strides of the planes are not equal, I guess it might cause problems
on some platforms.
But I think it's usually simpler to allocate one buffer per plane, and
perhaps even better as it doesn't require as large contiguous memory
area.
The modetest in libdrm can be an good example, send it[1] to you as
an reference.
Right, so modetest already does it successfully. So... What is the issue?
But then, the problem will become that it override the 'height' parameter.
What's the physical interpretation of the 'height' parameter when creating
an NV12 image with the dump API then?
I wouldn't be too concerned about physical interpretations. Yes, the
height, width and bpp parameters were likely designed with RGB formats
in mind. Yes, using DUMB_CREATE for YUV formats is probably something
that the original authors didn't envision. And yes, from that point of
view, it could be seen by the original authors as an abuse of the API.
But I don't think that's a problem as such.
Sometimes there may have 2D GPU or Image Process Unit get involved.
Setting the dimension, the clip window and pitch etc parameters to the
hardware are needed.
The value of bpp affects pitch, bigger pitch may cause the hardware load
more bytes than it should on one line. While the 'height' may affect the
size of the clip window, depend on the calculation method.
But I have no strong opinion toward with this and agree with you in overall.
An API is just an API. True, it would be nicer if the usage of the ioctl
parameters was more intuitive for YUV formats, but I believe we could
still standardize how the existing parameters map to linear scanout YUV
formats without causing the world to end. As has been said before, lots
of drivers are using DUMB_CREATE for this purpose, and we can't change
that.
This doesn't mean we shouldn't work on improving memory allocation, but
I see that as a separate issue.
I guess, solving complex problems with simple APIs may see the limitation,
sooner or later. But I not very sure and might be wrong. So other peoples
can override me words.
Everyone agrees that CREATE_DUMB is not the best ioctl to allocate
buffers, and one can't consider it to work identically across the
platforms. But it's what we have and what has been used for ages.
Yeah, your request are not unreasonable. It can be seen as a kind of rigid demand.
Since GEM DMA helpers doesn't export an more advanced interface to userspace so far.
As a result, drivers that employing GEM DMA has no other choice, but to abuse the
dumb buffer API to do allocation for the more complex format buffers.
The dumb buffer API doesn't support to specify buffer format, tile status and
placement etc. The more advance drivers has been exposed the xxx_create_gem()
to user-space. It seems that a few more experienced programmers hint us to
create an new ioctl at above thread, so that we can keep employing simple API
to do simple things and to suit complex needs with the more advanced APIs.
I'd really like to explore adding new ioctls to exposure memory
allocation constraints, and allocating the memory itself from DMA heaps.
--
Best regards,
Sui