Re: [PATCH 1/3] drm/amdgpu: Add userptr bo support for mGPUs when iommu is on

Felix Kuehling <felix.kuehling@xxxxxxx> · Wed, 12 Apr 2023 10:29:28 -0400

Am 2023-04-12 um 09:35 schrieb Philip Yang:

On 2023-04-04 09:45, Felix Kuehling wrote:
[+Philip]

Am 2023-04-04 um 08:47 schrieb Christian König:

Am 04.04.23 um 12:56 schrieb Xiao, Shane:
[AMD Official Use Only - General]

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Tuesday, April 4, 2023 6:27 PM
To: Xiao, Shane <shane.xiao@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Cc: Liu, Aaron <Aaron.Liu@xxxxxxx>; Guo, Shikai <Shikai.Guo@xxxxxxx>
Subject: Re: [PATCH 1/3] drm/amdgpu: Add userptr bo support for mGPUs
when iommu is on

Am 04.04.23 um 11:56 schrieb Shane Xiao:
For userptr bo with iommu on, multiple GPUs use same system 
memory dma
mapping address when both bo_adev and adev in identity mode or in 
the
same iommu group.
Hi Christian,

WTF? Userptr BOs are not allowed to be exported/imported between 
different
GPUs.

So how can the same userptr BO be used on different GPUs?
If GPUs are all in iommu identity mode which means dma address are 
the same as physical address,  all of the GPUs can see the system 
memory directly.

In such case, should we export/import the BO,  then create a new SG 
BO for another GPU?

Yes, absolutely. Each userptr BO is only meant to be used on one GPU.

Even if you could use the same BO for multiple GPUs it's not 
necessary a good idea since you then have live time problems for 
example.

E.g. what happens when one GPU is hot removed while the other one 
who imported the BO is still in use?

Felix can you comment on that? My recollection was that we rather 
improve the storage of DMA addresses instead of duplicating the BOs 
over different GPUs.

For KFD we currently enable sharing of userptr BOs between multiple 
GPUs by creating a userptr BO for the first GPU, and creating 
additional SG BOs using the same page list for additional GPUs. That 
way we don't have to call hmm_range_fault N times and setup N MMU 
notifiers for the same address range on an N GPU system. But we do 
have to create N DMA mappings, which is facilitated by the SG BOs.

We have a further optimization to not even store separate DMA 
addresses per-GPU if they are a direct mapping. In that case we just 
increase the reference count on the original userptr BO. (I agree 
that we should also look into more efficient storage of DMA 
addresses. However, last time we talked about this, you basically 
told us that scatter gather tables are being deprecated, but I 
haven't seen the alternative yet.)

I think this patch fixes a potential issue with that optimization. 
There is an implicit assumption, that the DMA addresses stored in the 
original userptr BO are a direct mapping, so we can reuse them on 
other GPUs that also use a direct mapping. But, we didn't actually 
check that assumption. I think this patch fixes that for systems 
where system memory is direct mapped on some but not all GPUs.

We will need similar optimization in svm_range_dma_map to reuse the 
prange->dma_addr for GPUs on same iommu group or direct mapping.

Good point. Optimizing for direct mapping is easy: Just skip the dmamap. 
Reusing the same dmamapping for multiple GPUs in the same IOMMU group is 
a bit harder. First you need to identify all the GPUs that are in the 
same group and then check if one of them already has a DMA mapping that 
can be reused. I wonder how common this is, and whether it's worth the 
trouble.

Regards,
  Felix



Regards,

Philip


This scenario should probably be called out explicitly in the patch 
description. The condition is also getting pretty hard to read and 
understand. Maybe move the both-direct-map-or-same-iommu-group 
conditions into a helper function, say 
"amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && reuse_dmamap(adev, 
bo_adev)".

Regards,
  Felix



Regards,
Christian.



Best Regards,
Shane

Regards,
Christian.

Signed-off-by: Shane Xiao <shane.xiao@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 ++++----
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e7403f8e4eba..33cda358cb9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -804,11 +804,11 @@ static int kfd_mem_attach(struct amdgpu_device
*adev, struct kgd_mem *mem,
                va + bo_size, vm);

           if ((adev == bo_adev && !(mem->alloc_flags &
KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) ||
- (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
adev->ram_is_direct_mapped) ||
-            same_hive) {
+ (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
((adev->ram_is_direct_mapped && bo_adev->ram_is_direct_mapped) ||
+ adev->dev->iommu_group == bo_adev->dev-
iommu_group)) ||
+same_hive){
               /* Mappings on the local GPU, or VRAM mappings in
the
-             * local hive, or userptr mapping IOMMU direct map
mode
-             * share the original BO
+             * local hive, or userptr mapping in the same dma
+             * address space share the original BO
                */
               attachment[i]->type = KFD_MEM_ATT_SHARED;
               bo[i] = mem->bo;