Hi, Thanks for writing this down On Thu, Nov 16, 2023 at 03:53:20PM +0000, Simon Ser wrote: > On Thursday, November 9th, 2023 at 08:45, Simon Ser <contact@xxxxxxxxxxx> wrote: > > > User-space sometimes needs to allocate scanout-capable memory for > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > not intended to be used for GPU rendering. Primary nodes should only > > be used for mode-setting purposes, other programs should not attempt > > to open it. Moreover, opening the primary node is already broken on > > some setups: systemd grants permission to open primary nodes to > > physically logged in users, but this breaks when the user is not > > physically logged in (e.g. headless setup) and when the distribution > > is using a different init (e.g. Alpine Linux uses openrc). > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > > So we've discussed about this patch on IRC [1] [2]. Some random notes: > > - We shouldn't create per-DRM-device heaps in general. Instead, we should try > using centralized heaps like the existing system and cma ones. That way other > drivers (video, render, etc) can also link to these heaps without depending > on the display driver. > - We can't generically link to heaps in core DRM, however we probably provide > a default for shmem and cma helpers. > - We're missing a bunch of heaps, e.g. sometimes there are multiple cma areas > but only a single cma heap is created right now. > - Some hw needs the memory to be in a specific region for scanout (e.g. lower > 256MB of RAM for Allwinner). We could create one heap per such region (but is > it fine to have overlapping heaps?). Just for reference, it's not the scanout itself that has that requirement on Allwinner SoCs, it's the HW codec. But if you want to display the decoded frame directly using dma-buf, you'll still need to either allocate a scanout buffer and hope it'll be in the lower 256MB, or allocate a buffer from the codec in the lower 256MB and then hope it's scanout-capable (which it is, so that's we do, but there's no guarantee about it). I think the logicvc is a much better example for this, since it requires framebuffers to be in a specific area, with each plane having a dedicated area. AFAIK that's the most extreme example we have upstream. Maxime
Attachment:
signature.asc
Description: PGP signature