Re: [PATCH v3] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 24.07.24 um 05:00 schrieb ZhenGuo Yin:
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.

v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3abfa66d72a2..6c6170f0f318 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -434,7 +434,7 @@ uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm *vm)
  	if (!vm)
  		return result;
- result += vm->generation;
+	result += (vm->generation & 0xFFFFFFFFULL);

Please use the lower_32_bits() macro here.

With that fixed the patch is Reviewed-by: Christian König <christian.koenig@xxxxxxx>

Thanks and sorry that I didn't initially got what the actual problem here is,
Christian.

  	/* Add one if the page tables will be re-generated on next CS */
  	if (drm_sched_entity_error(&vm->delayed))
  		++result;
@@ -463,13 +463,14 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  		       int (*validate)(void *p, struct amdgpu_bo *bo),
  		       void *param)
  {
+	uint64_t new_vm_generation = amdgpu_vm_generation(adev, vm);
  	struct amdgpu_vm_bo_base *bo_base;
  	struct amdgpu_bo *shadow;
  	struct amdgpu_bo *bo;
  	int r;
- if (drm_sched_entity_error(&vm->delayed)) {
-		++vm->generation;
+	if (vm->generation != new_vm_generation) {
+		vm->generation = new_vm_generation;
  		amdgpu_vm_bo_reset_state_machine(vm);
  		amdgpu_vm_fini_entities(vm);
  		r = amdgpu_vm_init_entities(adev, vm);
@@ -2439,7 +2440,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  	vm->last_update = dma_fence_get_stub();
  	vm->last_unlocked = dma_fence_get_stub();
  	vm->last_tlb_flush = dma_fence_get_stub();
-	vm->generation = 0;
+	vm->generation = amdgpu_vm_generation(adev, NULL);
mutex_init(&vm->eviction_lock);
  	vm->evicting = false;




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux