On Mon, Jan 20, 2025 at 08:58:20AM +0100, Thomas Zimmermann wrote: > Hi > > > Am 18.01.25 um 03:37 schrieb Marek Olšák: > [...] > > > > 3) Implementing DRM_FORMAT_MOD_LINEAR as having 256B pitch and offset > > alignment. This is what we do today. Even if Intel and some AMD chips > > can do 64B or 128B alignment, they overalign to 256B. With so many > > AMD+NV laptops out there, NV is probably next, unless they already do > > this in the closed source driver. I don't think this works, or at least not any better than the current linear modifier. There's way too many users of that thing out there that I think you can realistically redefine it. Adding new linear modifiers and then preferring those above the old LINEAR (if there is one left after all the wittling down to a common set) is I think the only option that really works to fix something. > The dumb-buffer series currently being discussed on dri-devel also touches > handling of scanline pitches. THe actual value varies with each driver. > Should dumb buffers use a default pitch alignment of 256 on al hardware? If you go with new modifiers then there could be shared code that dtrt by just looking at the modifier list. -Sima > Best regards > Thomas > > > > > Marek > > > > On Fri, Jan 17, 2025 at 9:18 AM Simona Vetter <simona.vetter@xxxxxxxx> > > wrote: > > > > On Wed, Jan 15, 2025 at 12:20:07PM +0000, Daniel Stone wrote: > > > On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo@xxxxxxxxx> wrote: > > > > On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone > > <daniel@xxxxxxxxxxxxx> wrote: > > > >> AMD hardware is the only hardware I know of which doesn't support > > > >> overaligning. Say (not hypothetically) we have a GPU and a > > display > > > >> controller which have a minimum pitch alignment of 32 bytes, no > > > >> minimum height alignment, minimum 32-byte offset alignment, > > minimum > > > >> pitch of 32 bytes, and minimum image size of 32 bytes. > > > >> > > > >> To be maximally compatible, we'd have to expose 28 (pitch > > align) * 32 > > > >> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min > > size) == > > > >> 19668992 individual modifiers when queried, which is 150MB > > per format > > > >> just to store the list of modifiers. > > > > > > > > Maximum compatibility is not required nor expected. > > > > > > > > In your case, only 1 linear modifier would be added for that > > driver, which is: [5 / 0 / 5 / 5 / 5] > > > > > > > > Then if, and only if, compatibility with other devices is > > desired, the driver developer could look at drivers of those other > > devices and determine which other linear modifiers to add. Ideally > > it would be just 1, so there would be a total of 2. > > > > > > Mali (actually two DRM drivers and sort of three Mesa drivers) > > can be > > > paired with any one of 11 KMS drivers (really 12 given that one is a > > > very independent subdriver), and something like 20 different codecs > > > (at least 12 different vendors; I didn't bother counting the actual > > > subdrivers which are all quite different). The VeriSilicon Hantro G2 > > > codec driver is shipped by five (that we know of) vendors who > > all have > > > their own KMS drivers. One of those is in the Rockchip RK3588, which > > > (don't ask me why) ships six different codec blocks, with three > > > different drivers, from two different vendors - that's before > > you even > > > get to things like the ISP and NPU which really need to be sharing > > > buffers properly without copies. > > > > > > So yeah, working widely without having to encode specific knowledge > > > everywhere isn't a nice-to-have, it's a hard baseline requirement. > > > > > > >> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps > > from detecting whether 2 devices have 0 compatible memory layouts, > > which is a useful thing to know. > > > >> > > > >> I get the point, but again, we have the exact same problem > > today with > > > >> placement, i.e. some devices require buffers to be in or not > > be in > > > >> VRAM or GTT or sysram for some uses, and some devices require > > physical > > > >> contiguity. Solving that problem would require an additional > > 4 bits, > > > >> which brings us to 2.3GB of modifiers per format with the current > > > >> scheme. Not super viable. > > > > > > > > Userspace doesn't determine placement. The kernel memory > > management can move buffers between heaps to accommodate sharing > > between devices as needed. This is a problem in which userspace > > has no say. > > > > > > It really does though! > > > > > > None of these devices use TTM with placement moves, and doing that > > > isn't a fix either. Embedded systems have so low memory > > bandwidth that > > > the difference between choosing the wrong placement and moving it > > > later vs. having the right placement to begin with is the difference > > > between 'this does not work' and 'great, I can ship this'. Which is > > > great if you're a consultancy trying to get paid, but tbh I'd rather > > > work on more interesting things. > > > > > > So yeah, userspace does very much choose the placement. On most > > > drivers, this is either by 'knowing' which device to allocate > > from, or > > > passing a flag to your allocation ioctl. For newer drivers though, > > > there's the dma-heap allocation mechanism which is now upstream and > > > the blessed path, for which userspace needs to explicitly know the > > > desired placement (and must, because fixing it up later is a > > > non-starter). > > > > > > Given that we need to keep LINEAR ~forever for ABI reasons, and > > > because there's no reasonably workable alternative, let's > > abandon the > > > idea of abandoning LINEAR, and try to work with out-of-band > > signalling > > > instead. > > > > > > One idea is to actually pursue the allocator idea and express this > > > properly through constraints. I'd be super in favour of this, > > > unsurprisingly, because it allows us to solve a whole pile of other > > > problems, rather than the extremely narrow AMD/Intel interop case. > > > > > > Another idea for the out-of-band signalling would be to add > > > information-only modifiers, like > > > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or > > > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't > > really > > > work at all with how people actually use modifiers: as the doc > > > describes, userspace takes and intersects the declared modifier > > lists > > > and passes the result through. The intersection of LINEAR+EQ256 and > > > LINEAR+GE32 is LINEAR, so a userspace that follows the rules > > will just > > > drop the hints on the floor and pick whatever linear allocation it > > > feels like. > > > > Yeah I think latest when we also take into account logical image > > size (not > > just pitch) with stuff like it needs to be aligned to 2 pixels in both > > directions just using modifiers falls apart. > > > > And the problem with linear, unlike device modifiers is that we > > can't just > > throw up our hands and enumerate the handful of formats in actual > > use for > > interop. There's so many produces and consumers of linera buffers > > (Daniel's list above missed camera/image processors) that save > > assumption > > is that anything really can happen. > > > > > I think I've just talked myself into the position that passing > > > allocator constraints together with modifiers is the only way to > > > actually solve this problem, at least without creating the sort of > > > technical debt that meant we spent years fixing up implicit/explicit > > > modifier interactions when it really should've just been adding a > > > !)@*(#$ u64 next to the u32. > > > > Yeah probably. > > > > Otoh I know inertia, so I am tempted to go with the oddball > > LINEAR_VEDNOR_A_VENDOR_B_INTEROP thing and stretch the runway for > > a bit. > > And we just assign those as we go as a very special thing, and the > > drivers > > that support it would prefer it above just LINEAR if there's no other > > common format left. > > > > Also makes it really obvious what all userspace/kernel driver enabling > > would be needed to justify such a modifier. > > -Sima > > > > > > > > Cheers, > > > Daniel > > > > -- Simona Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > > > -- > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Frankenstrasse 146, 90461 Nuernberg, Germany > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman > HRB 36809 (AG Nuernberg) > -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch