Re: [RFC PATCH] drm/prime: introduce DRM_PRIME_FD_TO_HANDLE_NO_MOVE

Simon Ser <contact@xxxxxxxxxxx> · Tue, 15 Oct 2024 11:01:38 +0000

On Tuesday, October 15th, 2024 at 12:47, Michel Dänzer <michel.daenzer@xxxxxxxxxxx> wrote:

> On 2024-10-13 15:34, Simon Ser wrote:
> 
> > This is a flag to opt-out of the automagic buffer migration to
> > system memory when importing a DMA-BUF.
> > 
> > In multi-GPU scenarii, a Wayland client might allocate on any
> > device. The Wayland compositor receiving the DMA-BUF has no clue
> > where the buffer has been allocated from. The compositor will
> > typically try to import the buffer into its "primary" device,
> > although it would be capable of importing into any DRM device.
> > 
> > This causes issues in case buffer imports implicitly result in
> > the buffer being moved to system memory. For instance, on a
> > system with an Intel iGPU and an AMD dGPU, a client rendering
> > with the dGPU and whose window is displayed on a screen
> > connected to the dGPU would ideally not need any roundtrip
> > to the iGPU. However, any attempt at figuring out where the
> > DMA-BUF could be accessed from will move the buffer into system
> > memory, degrading performance for the rest of the lifetime of the
> > buffer.
> > 
> > Describing on which device the buffer has been allocated on is
> > not enough: on some setups the buffer may have been allocated on
> > one device but may still be directly accessible without any move
> > on another device. For instance, on a split render/display system,
> > a buffer allocated on the display device can be directly rendered
> > to from the render device.
> > 
> > With this new flag, a compositor can try to import on all DRM
> > devices without any side effects. If it finds a device which can
> > access the buffer without a move, it can use that device to render
> > the buffer. If it doesn't, it can fallback to the previous
> > behavior: try to import without the flag to the "primary" device,
> > knowing this could result in a move to system memory.
> 
> One problem with this approach is that even if a buffer is originally created in / intended for local VRAM of a dGPU, it may get temporarily migrated to system RAM for other reasons, e.g. to make room for other buffers in VRAM. While it resides in system RAM, importing into another device with DRM_PRIME_FD_TO_HANDLE_NO_MOVE will work, but will result in pinning the buffer to system RAM, even though this isn't optimal for the intended buffer usage.

Indeed. Do you think we could have a flag which also prevents pinning?

Sounds like that would involve implementing dynamic DMA-BUF importers in
GEM? (Some drivers like xe already implement that.)

As a first step, a flag which checks whether the buffer comes from the
same device it's imported from would be tremendously useful, even if
that wouldn't work with split render/display systems. Ideally a new uAPI
which can be extended to support such systems in the future would be
great.

> In other words, the new flag only gives the compositor information about the current state, not about the intention of the client. Another mechanism like https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/268 is needed for the latter.
> 
> So while this flag might be useful to prevent unintended buffer migration in some cases, it can't solve all multi-GPU issues for compositors.

I'm still not willing to give up on the idea that this doesn't need
protocol changes in the long run, but maybe I'm being too optimistic
here. :)