Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 20, 2025 at 08:58:20AM +0100, Thomas Zimmermann wrote:
> Hi
> 
> 
> Am 18.01.25 um 03:37 schrieb Marek Olšák:
> [...]
> > 
> > 3) Implementing DRM_FORMAT_MOD_LINEAR as having 256B pitch and offset
> > alignment. This is what we do today. Even if Intel and some AMD chips
> > can do 64B or 128B alignment, they overalign to 256B. With so many
> > AMD+NV laptops out there, NV is probably next, unless they already do
> > this in the closed source driver.

I don't think this works, or at least not any better than the current
linear modifier. There's way too many users of that thing out there that I
think you can realistically redefine it.

Adding new linear modifiers and then preferring those above the old LINEAR
(if there is one left after all the wittling down to a common set) is I
think the only option that really works to fix something.

> The dumb-buffer series currently being discussed on dri-devel also touches
> handling of scanline pitches. THe actual value varies with each driver. 
> Should dumb buffers use a default pitch alignment of 256 on al hardware?

If you go with new modifiers then there could be shared code that dtrt by
just looking at the modifier list.
-Sima

> Best regards
> Thomas
> 
> > 
> > Marek
> > 
> > On Fri, Jan 17, 2025 at 9:18 AM Simona Vetter <simona.vetter@xxxxxxxx>
> > wrote:
> > 
> >     On Wed, Jan 15, 2025 at 12:20:07PM +0000, Daniel Stone wrote:
> >     > On Wed, 15 Jan 2025 at 04:05, Marek Olšák <maraeo@xxxxxxxxx> wrote:
> >     > > On Tue, Jan 14, 2025 at 12:58 PM Daniel Stone
> >     <daniel@xxxxxxxxxxxxx> wrote:
> >     > >> AMD hardware is the only hardware I know of which doesn't support
> >     > >> overaligning. Say (not hypothetically) we have a GPU and a
> >     display
> >     > >> controller which have a minimum pitch alignment of 32 bytes, no
> >     > >> minimum height alignment, minimum 32-byte offset alignment,
> >     minimum
> >     > >> pitch of 32 bytes, and minimum image size of 32 bytes.
> >     > >>
> >     > >> To be maximally compatible, we'd have to expose 28 (pitch
> >     align) * 32
> >     > >> (height align) * 28 (offset align) * 28 (min pitch) * 28 (min
> >     size) ==
> >     > >> 19668992 individual modifiers when queried, which is 150MB
> >     per format
> >     > >> just to store the list of modifiers.
> >     > >
> >     > > Maximum compatibility is not required nor expected.
> >     > >
> >     > > In your case, only 1 linear modifier would be added for that
> >     driver, which is: [5 / 0 / 5 / 5 / 5]
> >     > >
> >     > > Then if, and only if, compatibility with other devices is
> >     desired, the driver developer could look at drivers of those other
> >     devices and determine which other linear modifiers to add. Ideally
> >     it would be just 1, so there would be a total of 2.
> >     >
> >     > Mali (actually two DRM drivers and sort of three Mesa drivers)
> >     can be
> >     > paired with any one of 11 KMS drivers (really 12 given that one is a
> >     > very independent subdriver), and something like 20 different codecs
> >     > (at least 12 different vendors; I didn't bother counting the actual
> >     > subdrivers which are all quite different). The VeriSilicon Hantro G2
> >     > codec driver is shipped by five (that we know of) vendors who
> >     all have
> >     > their own KMS drivers. One of those is in the Rockchip RK3588, which
> >     > (don't ask me why) ships six different codec blocks, with three
> >     > different drivers, from two different vendors - that's before
> >     you even
> >     > get to things like the ISP and NPU which really need to be sharing
> >     > buffers properly without copies.
> >     >
> >     > So yeah, working widely without having to encode specific knowledge
> >     > everywhere isn't a nice-to-have, it's a hard baseline requirement.
> >     >
> >     > >> > DRM_FORMAT_MOD_LINEAR needs to go because it prevents apps
> >     from detecting whether 2 devices have 0 compatible memory layouts,
> >     which is a useful thing to know.
> >     > >>
> >     > >> I get the point, but again, we have the exact same problem
> >     today with
> >     > >> placement, i.e. some devices require buffers to be in or not
> >     be in
> >     > >> VRAM or GTT or sysram for some uses, and some devices require
> >     physical
> >     > >> contiguity. Solving that problem would require an additional
> >     4 bits,
> >     > >> which brings us to 2.3GB of modifiers per format with the current
> >     > >> scheme. Not super viable.
> >     > >
> >     > > Userspace doesn't determine placement. The kernel memory
> >     management can move buffers between heaps to accommodate sharing
> >     between devices as needed. This is a problem in which userspace
> >     has no say.
> >     >
> >     > It really does though!
> >     >
> >     > None of these devices use TTM with placement moves, and doing that
> >     > isn't a fix either. Embedded systems have so low memory
> >     bandwidth that
> >     > the difference between choosing the wrong placement and moving it
> >     > later vs. having the right placement to begin with is the difference
> >     > between 'this does not work' and 'great, I can ship this'. Which is
> >     > great if you're a consultancy trying to get paid, but tbh I'd rather
> >     > work on more interesting things.
> >     >
> >     > So yeah, userspace does very much choose the placement. On most
> >     > drivers, this is either by 'knowing' which device to allocate
> >     from, or
> >     > passing a flag to your allocation ioctl. For newer drivers though,
> >     > there's the dma-heap allocation mechanism which is now upstream and
> >     > the blessed path, for which userspace needs to explicitly know the
> >     > desired placement (and must, because fixing it up later is a
> >     > non-starter).
> >     >
> >     > Given that we need to keep LINEAR ~forever for ABI reasons, and
> >     > because there's no reasonably workable alternative, let's
> >     abandon the
> >     > idea of abandoning LINEAR, and try to work with out-of-band
> >     signalling
> >     > instead.
> >     >
> >     > One idea is to actually pursue the allocator idea and express this
> >     > properly through constraints. I'd be super in favour of this,
> >     > unsurprisingly, because it allows us to solve a whole pile of other
> >     > problems, rather than the extremely narrow AMD/Intel interop case.
> >     >
> >     > Another idea for the out-of-band signalling would be to add
> >     > information-only modifiers, like
> >     > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_EQ(256), or
> >     > DRM_FORMAT_MOD_LINEAR_PITCH_ALIGN_GE(32). But then that doesn't
> >     really
> >     > work at all with how people actually use modifiers: as the doc
> >     > describes, userspace takes and intersects the declared modifier
> >     lists
> >     > and passes the result through. The intersection of LINEAR+EQ256 and
> >     > LINEAR+GE32 is LINEAR, so a userspace that follows the rules
> >     will just
> >     > drop the hints on the floor and pick whatever linear allocation it
> >     > feels like.
> > 
> >     Yeah I think latest when we also take into account logical image
> >     size (not
> >     just pitch) with stuff like it needs to be aligned to 2 pixels in both
> >     directions just using modifiers falls apart.
> > 
> >     And the problem with linear, unlike device modifiers is that we
> >     can't just
> >     throw up our hands and enumerate the handful of formats in actual
> >     use for
> >     interop. There's so many produces and consumers of linera buffers
> >     (Daniel's list above missed camera/image processors) that save
> >     assumption
> >     is that anything really can happen.
> > 
> >     > I think I've just talked myself into the position that passing
> >     > allocator constraints together with modifiers is the only way to
> >     > actually solve this problem, at least without creating the sort of
> >     > technical debt that meant we spent years fixing up implicit/explicit
> >     > modifier interactions when it really should've just been adding a
> >     > !)@*(#$ u64 next to the u32.
> > 
> >     Yeah probably.
> > 
> >     Otoh I know inertia, so I am tempted to go with the oddball
> >     LINEAR_VEDNOR_A_VENDOR_B_INTEROP thing and stretch the runway for
> >     a bit.
> >     And we just assign those as we go as a very special thing, and the
> >     drivers
> >     that support it would prefer it above just LINEAR if there's no other
> >     common format left.
> > 
> >     Also makes it really obvious what all userspace/kernel driver enabling
> >     would be needed to justify such a modifier.
> >     -Sima
> > 
> >     >
> >     > Cheers,
> >     > Daniel
> > 
> >     --     Simona Vetter
> >     Software Engineer, Intel Corporation
> >     http://blog.ffwll.ch
> > 
> 
> -- 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Frankenstrasse 146, 90461 Nuernberg, Germany
> GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> HRB 36809 (AG Nuernberg)
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux