Re: [PATCH V2] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jesse,

Am 28.02.24 um 09:43 schrieb jesse.zhang@xxxxxxx:
From: "Jesse.Zhang" <Jesse.Zhang@xxxxxxx>

fix the issue:
"amdgpu: Failed to create process VM object".

[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table.
But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm
and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

[HOW]
Skip the seq64 entry check in vm page table.

Signed-off-by: Jesse Zhang <Jesse.Zhang@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 13 +++++++++++++
  1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index a160265ddc07..bdae5381887e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -746,8 +746,21 @@ bool amdgpu_vm_pt_is_root_clean(struct amdgpu_device *adev,
  	enum amdgpu_vm_level root = adev->vm_manager.root_level;
  	unsigned int entries = amdgpu_vm_pt_num_entries(adev, root);
  	unsigned int i = 0;
+	u64 seq64_addr = (adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT) - AMDGPU_VA_RESERVED_TOP;
+
+	seq64_addr /= AMDGPU_GPU_PAGE_SIZE;
+	mask = amdgpu_vm_pt_entries_mask(adev, adev->vm_manager.root_level);
+	shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level);
+	seq64_entry = (seq64_addr >> shift) & mask;
for (i = 0; i < entries; i++) {
+		/* seq64  reserve 2M memory from top of address space.
+		 * Then do the mapping and update the vm page table at amdgpu initialize.
+		 * So skip the know result.
+		 */
+
+		if(i == seq64_entry)
+			continue;

Once more it is intentional that this fails!

Renoir shouldn't be using the ATS setting any more because that functionality was removed.

But it looks like the setting is somehow still active and because of this you run into this issue here.

Regards,
Christian.

  		if (to_amdgpu_bo_vm(vm->root.bo)->entries[i].bo)
  			return false;
  	}




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux