Re: [PATCH 1/2] drm/amdgpu: Acquire ttm locks for dmaunmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023-10-11 14:22, David Francis wrote:
dmaunmap can call ttm_bo_validate, which expects the
ttm dma_resv to be held.

Acquire the locks in amdgpu_amdkfd_gpuvm_dmaunmap_mem.

Signed-off-by: David Francis <David.Francis@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 +++++++++++-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c         |  7 ++++++-
  3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 3ad8dc523b42..dba4f6b7a2f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -302,7 +302,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev,
  					  struct kgd_mem *mem, void *drm_priv);
  int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
  		struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv);
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
  int amdgpu_amdkfd_gpuvm_sync_memory(
  		struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
  int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a15e59abe70a..808deec8aa58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2094,21 +2094,31 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
  	return ret;
  }
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
  {
  	struct kfd_mem_attachment *entry;
  	struct amdgpu_vm *vm;
+	struct bo_vm_reservation_context ctx;
+	int ret;
vm = drm_priv_to_vm(drm_priv); mutex_lock(&mem->lock); + ret = reserve_bo_and_cond_vms(mem, vm, BO_VM_MAPPED, &ctx);

Looks like you copied this from somewhere else. Do you really need to reserve the VM here, or just the BO? If you only need to reserve the BO, just use amdgpu_bo_reserve.


+	if (ret)
+		goto out;
+
  	list_for_each_entry(entry, &mem->attachments, list) {
  		if (entry->bo_va->base.vm == vm)
  			kfd_mem_dmaunmap_attachment(mem, entry);
  	}
+ unreserve_bo_and_vms(&ctx, false, false);
+
+out:
  	mutex_unlock(&mem->lock);
+	return ret;
  }
int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 06988cf1db51..21d4e7d46238 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1442,7 +1442,11 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
  			kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
/* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */
-		amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+		err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+		if (err) {

This is probably not correct. The only failure case I know of is, that the reservation is interrupted by a signal and returns -ERESTARTSYS. When this happens, the ioctl will automatically get restarted after handling the signal. We need to make sure that we handle this restart case correctly. When this happens, all the unmapping is already done from the first time the function ran. All we need to repeat is the dmaunmap.

We already handled this scenario for amdgpu_amdkfd_gpuvm_sync_memory. Make sure it also works correctly for this new dmaunmap case. Basically, we don't want to accidentally dmaunmap the same thing twice. And we need to make sure that amdgpu_amdkfd_gpuvm_sync_memory is harmless if it's called multiple times due to restarts.

There is a test in KFDTest specifically to try to recreate this scenario where amdgpu_amdkfd_gpuvm_sync_memory gets interrupted by a signal: KFDMemoryTest.SignalHandling. It probably won't be able to trigger a signal interrupting an amdgpu_bo_reserve, though, because that's very fast if there is no lock contention. So we may just have to rely on code review. Or you can try to recreate the scenario manually by returning -ERESTARTSYS under some conditions.

Regards,
  Felix


+			pr_debug("DMA unmapping failed\n");
+			goto dmaunmap_failed;
+		}
  	}
mutex_unlock(&p->mutex);
@@ -1455,6 +1459,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
  get_mem_obj_from_handle_failed:
  unmap_memory_from_gpu_failed:
  sync_memory_failed:
+dmaunmap_failed:
  	mutex_unlock(&p->mutex);
  copy_from_user_failed:
  	kfree(devices_arr);



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux