Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

Christian König <christian.koenig@xxxxxxx> · Fri, 23 Feb 2024 10:34:49 +0100

Am 23.02.24 um 09:11 schrieb Michel Dänzer:
On 2024-02-23 08:06, Christian König wrote:
Am 22.02.24 um 18:28 schrieb Michel Dänzer:
From: Michel Dänzer <mdaenzer@xxxxxxxxxx>

Pinning the BO storage to VRAM for scanout would make it inaccessible
to non-P2P dma-buf importers.
Thinking more about it I don't think we can do this.

Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.

And if I'm not completely mistaken we actually have use cases for this at the moment,
Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?

Nope, we are basically talking about unit tests and examples for inter 
device operations.

Those render into a shared buffer and then display it to check if the 
content was rendered/transferred correctly.

I'm not sure if we still do those test cases, the last time I looked 
into it was before P2P was even supported, but I also can't rule it out.

(As discussed on the GitLab issue, AFAICT P2P could be made to work even without CONFIG_DMABUF_MOVE_NOTIFY, by pinning to VRAM instead of GTT for dma-buf sharing)

Longer story but that is something intentionally not done.

only as fallback but it would still break existing userspace and that is a no-go.
I'm obviously aware of this general rule. There are exceptions though, and this might be one.

So rejecting things during CS and atomic commit is the best thing we can do.
It's problematic for a Wayland compositor:

The CS ioctl failing is awkward. With GL, I'm pretty sure it means the compositor would have to destroy the context and create a new one. Not sure about Vulkan, but I suspect it's painful there as well.

Similarly for the KMS atomic commit ioctl. The compositor can't know why exactly it failed, all it gets is an error code.

Yeah, but that is not because the kernel is doing anything wrong.

Sharing, rendering and then doing an atomic commit is a perfectly valid 
use case.

You just can't do scanout and sharing at the same time.

And there's no other way for the compositor to detect when both things can actually work concurrently.

That I totally agree with. And IIRC we already have at least a query for 
the buffer placement. E.g. you can already check if the BO is in GTT or 
VRAM and shared.

What's missing is exposing if the device can scanout from GTT or not.

It's just that blocking a valid use case because a special combination 
doesn't work is not going to fly I think.

Together, this means the compositor always has to assume the worst case, and do workarounds such as using the scanout GPU to copy from the scanout buffer to another buffer for access from another GPU. Even when direct access from the other GPU would actually work fine.

Yeah, I think we can avoid that. I'm just not sure how to do it in a 
driver agnostic way.

Regards,
Christian.