Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

Daniel Stone <daniel@xxxxxxxxxxxxx> · Wed, 15 Jan 2025 12:20:07 +0000

On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo@xxxxxxxxx> wrote:
> On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:
>> AMD hardware is the only hardware I know of which doesn't support
>> overaligning. Say (not hypothetically) we have a GPU and a display
>> controller which have a minimum pitch alignment of 32 bytes, no
>> minimum height alignment, minimum 32-byte offset alignment, minimum
>> pitch of 32 bytes, and minimum image size of 32 bytes.
>>
>> To be maximally compatible, we'd have to expose 28 (pitch align) * 32
>> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min size) ==
>> 19668992 individual modifiers when queried, which is 150MB per format
>> just to store the list of modifiers.
>
> Maximum compatibility is not required nor expected.
>
> In your case, only 1 linear modifier would be added for that driver, which is: [5 / 0 / 5 / 5 / 5]
>
> Then if, and only if, compatibility with other devices is desired, the driver developer could look at drivers of those other devices and determine which other linear modifiers to add. Ideally it would be just 1, so there would be a total of 2.

Mali (actually two DRM drivers and sort of three Mesa drivers) can be
paired with any one of 11 KMS drivers (really 12 given that one is a
very independent subdriver), and something like 20 different codecs
(at least 12 different vendors; I didn't bother counting the actual
subdrivers which are all quite different). The VeriSilicon Hantro G2
codec driver is shipped by five (that we know of) vendors who all have
their own KMS drivers. One of those is in the Rockchip RK3588, which
(don't ask me why) ships six different codec blocks, with three
different drivers, from two different vendors - that's before you even
get to things like the ISP and NPU which really need to be sharing
buffers properly without copies.

So yeah, working widely without having to encode specific knowledge
everywhere isn't a nice-to-have, it's a hard baseline requirement.

>> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps from detecting whether 2 devices have 0 compatible memory layouts, which is a useful thing to know.
>>
>> I get the point, but again, we have the exact same problem today with
>> placement, i.e. some devices require buffers to be in or not be in
>> VRAM or GTT or sysram for some uses, and some devices require physical
>> contiguity. Solving that problem would require an additional 4 bits,
>> which brings us to 2.3GB of modifiers per format with the current
>> scheme. Not super viable.
>
> Userspace doesn't determine placement. The kernel memory management can move buffers between heaps to accommodate sharing between devices as needed. This is a problem in which userspace has no say.

It really does though!

None of these devices use TTM with placement moves, and doing that
isn't a fix either. Embedded systems have so low memory bandwidth that
the difference between choosing the wrong placement and moving it
later vs. having the right placement to begin with is the difference
between 'this does not work' and 'great, I can ship this'. Which is
great if you're a consultancy trying to get paid, but tbh I'd rather
work on more interesting things.

So yeah, userspace does very much choose the placement. On most
drivers, this is either by 'knowing' which device to allocate from, or
passing a flag to your allocation ioctl. For newer drivers though,
there's the dma-heap allocation mechanism which is now upstream and
the blessed path, for which userspace needs to explicitly know the
desired placement (and must, because fixing it up later is a
non-starter).

Given that we need to keep LINEAR ~forever for ABI reasons, and
because there's no reasonably workable alternative, let's abandon the
idea of abandoning LINEAR, and try to work with out-of-band signalling
instead.

One idea is to actually pursue the allocator idea and express this
properly through constraints. I'd be super in favour of this,
unsurprisingly, because it allows us to solve a whole pile of other
problems, rather than the extremely narrow AMD/Intel interop case.

Another idea for the out-of-band signalling would be to add
information-only modifiers, like
DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or
DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't really
work at all with how people actually use modifiers: as the doc
describes, userspace takes and intersects the declared modifier lists
and passes the result through. The intersection of LINEAR+EQ256 and
LINEAR+GE32 is LINEAR, so a userspace that follows the rules will just
drop the hints on the floor and pick whatever linear allocation it
feels like.

I think I've just talked myself into the position that passing
allocator constraints together with modifiers is the only way to
actually solve this problem, at least without creating the sort of
technical debt that meant we spent years fixing up implicit/explicit
modifier interactions when it really should've just been adding a
!)@*(#$ u64 next to the u32.

Cheers,
Daniel