Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

Daniel Stone <daniel@xxxxxxxxxxxxx> · Tue, 14 Jan 2025 17:58:16 +0000

Hi,

On Tue, 14 Jan 2025 at 09:38, Marek Olšák <maraeo@xxxxxxxxx> wrote:
> I would keep the existing modifier interfaces, API extensions, and expectations the same as today for simplicity.

Well yes, not just for simplicity, but because everything stops
working if you don't.

> The new linear modifier definition (proposal) will have these fields:
>    5 bits for log2 pitch alignment in bytes
>    5 bits for log2 height alignment in rows
>    5 bits for log2 offset alignment in bytes
>    5 bits for log2 minimum pitch in bytes
>    5 bits for log2 minimum (2D) image size in bytes
>
> The pitch and the image size in bytes are no longer arbitrary values. They are fixed values computed from {width, height, bpp, modifier} as follows:
>    aligned_width = align(width * bpp / 8, 1 << log2_pitch_alignment);
>    aligned_height = align(height, 1 << log2_height_alignment);
>    pitch = max(1 << log2_minimum_pitch, aligned_width);
>    image_size = max(1 << log2_minimum_image_size, pitch * aligned_height);
>
>
> The modifier defines the layout exactly and non-ambiguously. Overaligning the pitch or height is not supported. Only the offset alignment has some freedom regarding placement. Drivers can expose whatever they want within that definition, even exposing only 1 linear modifier is OK. Then, you can look at modifiers of other drivers if you want to find commonalities.

I don't see how this squares with the first statement.

AMD hardware is the only hardware I know of which doesn't support
overaligning. Say (not hypothetically) we have a GPU and a display
controller which have a minimum pitch alignment of 32 bytes, no
minimum height alignment, minimum 32-byte offset alignment, minimum
pitch of 32 bytes, and minimum image size of 32 bytes.

To be maximally compatible, we'd have to expose 28 (pitch align) * 32
(height align) * 28 (offset align) * 28 (min pitch) * 28 (min size) ==
19668992 individual modifiers when queried, which is 150MB per format
just to store the list of modifiers.

> DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps from detecting whether 2 devices have 0 compatible memory layouts, which is a useful thing to know.

I get the point, but again, we have the exact same problem today with
placement, i.e. some devices require buffers to be in or not be in
VRAM or GTT or sysram for some uses, and some devices require physical
contiguity. Solving that problem would require an additional 4 bits,
which brings us to 2.3GB of modifiers per format with the current
scheme. Not super viable.

The design for the allocator - communicating constraints ('pitch must
be exactly aligned to the next 256-byte boundary', 'pitch must be a
multiple of 32 bytes', 'buffer must be physically contiguous', etc)
separately from the chosen layout - would seem to fit this much
better. And since there doesn't seem to be a tractable solution we can
jam into the single 'intersect multiple sets of uint64s' API, we might
as well go through typing that out.

Cheers,
Daniel