[+Philip]
Am 2023-04-04 um 08:47 schrieb Christian König:
Am 04.04.23 um 12:56 schrieb Xiao, Shane:
[AMD Official Use Only - General]
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Tuesday, April 4, 2023 6:27 PM
To: Xiao, Shane <shane.xiao@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Cc: Liu, Aaron <Aaron.Liu@xxxxxxx>; Guo, Shikai <Shikai.Guo@xxxxxxx>
Subject: Re: [PATCH 1/3] drm/amdgpu: Add userptr bo support for mGPUs
when iommu is on
Am 04.04.23 um 11:56 schrieb Shane Xiao:
For userptr bo with iommu on, multiple GPUs use same system memory dma
mapping address when both bo_adev and adev in identity mode or in the
same iommu group.
Hi Christian,
WTF? Userptr BOs are not allowed to be exported/imported between
different
GPUs.
So how can the same userptr BO be used on different GPUs?
If GPUs are all in iommu identity mode which means dma address are
the same as physical address, all of the GPUs can see the system
memory directly.
In such case, should we export/import the BO, then create a new SG
BO for another GPU?
Yes, absolutely. Each userptr BO is only meant to be used on one GPU.
Even if you could use the same BO for multiple GPUs it's not necessary
a good idea since you then have live time problems for example.
E.g. what happens when one GPU is hot removed while the other one who
imported the BO is still in use?
Felix can you comment on that? My recollection was that we rather
improve the storage of DMA addresses instead of duplicating the BOs
over different GPUs.
For KFD we currently enable sharing of userptr BOs between multiple GPUs
by creating a userptr BO for the first GPU, and creating additional SG
BOs using the same page list for additional GPUs. That way we don't have
to call hmm_range_fault N times and setup N MMU notifiers for the same
address range on an N GPU system. But we do have to create N DMA
mappings, which is facilitated by the SG BOs.
We have a further optimization to not even store separate DMA addresses
per-GPU if they are a direct mapping. In that case we just increase the
reference count on the original userptr BO. (I agree that we should also
look into more efficient storage of DMA addresses. However, last time we
talked about this, you basically told us that scatter gather tables are
being deprecated, but I haven't seen the alternative yet.)
I think this patch fixes a potential issue with that optimization. There
is an implicit assumption, that the DMA addresses stored in the original
userptr BO are a direct mapping, so we can reuse them on other GPUs that
also use a direct mapping. But, we didn't actually check that
assumption. I think this patch fixes that for systems where system
memory is direct mapped on some but not all GPUs.
This scenario should probably be called out explicitly in the patch
description. The condition is also getting pretty hard to read and
understand. Maybe move the both-direct-map-or-same-iommu-group
conditions into a helper function, say
"amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && reuse_dmamap(adev, bo_adev)".
Regards,
Felix
Regards,
Christian.
Best Regards,
Shane
Regards,
Christian.
Signed-off-by: Shane Xiao <shane.xiao@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e7403f8e4eba..33cda358cb9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -804,11 +804,11 @@ static int kfd_mem_attach(struct amdgpu_device
*adev, struct kgd_mem *mem,
va + bo_size, vm);
if ((adev == bo_adev && !(mem->alloc_flags &
KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) ||
- (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
adev->ram_is_direct_mapped) ||
- same_hive) {
+ (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
((adev->ram_is_direct_mapped && bo_adev->ram_is_direct_mapped) ||
+ adev->dev->iommu_group == bo_adev->dev-
iommu_group)) ||
+same_hive){
/* Mappings on the local GPU, or VRAM mappings in
the
- * local hive, or userptr mapping IOMMU direct map
mode
- * share the original BO
+ * local hive, or userptr mapping in the same dma
+ * address space share the original BO
*/
attachment[i]->type = KFD_MEM_ATT_SHARED;
bo[i] = mem->bo;