Re: [PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 23.10.23 um 15:06 schrieb Daniel Tang:
That commit causes the screen to freeze a few moments after running
clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer
including ssh also freezes. On v6.5-rc1, it only results in a NULL pointer
deference message in dmesg and the process to become a zombie whose
unkillableness prevents shutdown without REISUB. Although llama.cpp and
hashcat were working in v6.2 and ROCm 5.6, broke, and are not fixed by
this revert, pytorch-rocm is now working with stability and without
whole-computer freezes caused by any accidental running of clinfo.

This reverts commit 1d7776cc148b9f2f3ebaf1181662ba695a29f639.

That result doesn't make much sense. Felix please correct me, but AFAIK the ATS stuff was completely removed by now.

Are you sure that this is pure v6.6-rc7 and not some other patches applied? If yes than we must have missed something.

Regards,
Christian.


Closes: https://github.com/RadeonOpenCompute/ROCm/issues/2596
Signed-off-by: Daniel Tang <danielzgtg.opensource@xxxxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 82f25996ff5e..602f311ab766 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2243,16 +2243,16 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
  	if (r)
  		return r;
+ /* Sanity checks */
+	if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
+		r = -EINVAL;
+		goto unreserve_bo;
+	}
+
  	/* Check if PD needs to be reinitialized and do it before
  	 * changing any other state, in case it fails.
  	 */
  	if (pte_support_ats != vm->pte_support_ats) {
-		/* Sanity checks */
-		if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-			r = -EINVAL;
-			goto unreserve_bo;
-		}
-
  		vm->pte_support_ats = pte_support_ats;
  		r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
  				       false);
--
2.40.1







[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux