Hi Oak, Am 23.01.24 um 04:21 schrieb Zeng, Oak:
Hi Danilo and all, During the work of Intel's SVM code, we came up the idea of making drm_gpuvm to work across multiple gpu devices. See some discussion here: https://lore.kernel.org/dri-devel/PH7PR11MB70049E7E6A2F40BF6282ECC292742@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ The reason we try to do this is, for a SVM (shared virtual memory across cpu program and all gpu program on all gpu devices) process, the address space has to be across all gpu devices. So if we make drm_gpuvm to work across devices, then our SVM code can leverage drm_gpuvm as well. At a first look, it seems feasible because drm_gpuvm doesn't really use the drm_device *drm pointer a lot. This param is used only for printing/warning. So I think maybe we can delete this drm field from drm_gpuvm. This way, on a multiple gpu device system, for one process, we can have only one drm_gpuvm instance, instead of multiple drm_gpuvm instances (one for each gpu device). What do you think?
Well from the GPUVM side I don't think it would make much difference if we have the drm device or not.
But the experience we had with the KFD I think I should mention that we should absolutely *not* deal with multiple devices at the same time in the UAPI or VM objects inside the driver.
The background is that all the APIs inside the Linux kernel are build around the idea that they work with only one device at a time. This accounts for both low level APIs like the DMA API as well as pretty high level things like for example file system address space etc...
So when you have multiple GPUs you either have an inseparable cluster of them which case you would also only have one drm_device. Or you have separated drm_device which also results in separate drm render nodes and separate virtual address spaces and also eventually separate IOMMU domains which gives you separate dma_addresses for the same page and so separate GPUVM page tables....
It's up to you how to implement it, but I think it's pretty clear that you need separate drm_gpuvm objects to manage those.
That you map the same thing in all those virtual address spaces at the same address is a completely different optimization problem I think. What we could certainly do is to optimize hmm_range_fault by making hmm_range a reference counted object and using it for multiple devices at the same time if those devices request the same range of an mm_struct.
I think if you start using the same drm_gpuvm for multiple devices you will sooner or later start to run into the same mess we have seen with KFD, where we moved more and more functionality from the KFD to the DRM render node because we found that a lot of the stuff simply doesn't work correctly with a single object to maintain the state.
Just one more point to your original discussion on the xe list: I think it's perfectly valid for an application to map something at the same address you already have something else.
Cheers, Christian.
Thanks, Oak