Re: Making drm_gpuvm work across gpu devices

Christian König <christian.koenig@xxxxxxx> · Tue, 23 Jan 2024 12:13:12 +0100

Hi Oak,

Am 23.01.24 um 04:21 schrieb Zeng, Oak:
Hi Danilo and all,

During the work of Intel's SVM code, we came up the idea of making drm_gpuvm to work across multiple gpu devices. See some discussion here: https://lore.kernel.org/dri-devel/PH7PR11MB70049E7E6A2F40BF6282ECC292742@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

The reason we try to do this is, for a SVM (shared virtual memory across cpu program and all gpu program on all gpu devices) process, the address space has to be across all gpu devices. So if we make drm_gpuvm to work across devices, then our SVM code can leverage drm_gpuvm as well.

At a first look, it seems feasible because drm_gpuvm doesn't really use the drm_device *drm pointer a lot. This param is used only for printing/warning. So I think maybe we can delete this drm field from drm_gpuvm.

This way, on a multiple gpu device system, for one process, we can have only one drm_gpuvm instance, instead of multiple drm_gpuvm instances (one for each gpu device).

What do you think?

Well from the GPUVM side I don't think it would make much difference if 
we have the drm device or not.

But the experience we had with the KFD I think I should mention that we 
should absolutely *not* deal with multiple devices at the same time in 
the UAPI or VM objects inside the driver.

The background is that all the APIs inside the Linux kernel are build 
around the idea that they work with only one device at a time. This 
accounts for both low level APIs like the DMA API as well as pretty high 
level things like for example file system address space etc...

So when you have multiple GPUs you either have an inseparable cluster of 
them which case you would also only have one drm_device. Or you have 
separated drm_device which also results in separate drm render nodes and 
separate virtual address spaces and also eventually separate IOMMU 
domains which gives you separate dma_addresses for the same page and so 
separate GPUVM page tables....

It's up to you how to implement it, but I think it's pretty clear that 
you need separate drm_gpuvm objects to manage those.

That you map the same thing in all those virtual address spaces at the 
same address is a completely different optimization problem I think. 
What we could certainly do is to optimize hmm_range_fault by making 
hmm_range a reference counted object and using it for multiple devices 
at the same time if those devices request the same range of an mm_struct.

I think if you start using the same drm_gpuvm for multiple devices you 
will sooner or later start to run into the same mess we have seen with 
KFD, where we moved more and more functionality from the KFD to the DRM 
render node because we found that a lot of the stuff simply doesn't work 
correctly with a single object to maintain the state.

Just one more point to your original discussion on the xe list: I think 
it's perfectly valid for an application to map something at the same 
address you already have something else.

Cheers,
Christian.

Thanks,
Oak