Am 04.02.22 um 19:12 schrieb Bhardwaj, Rajneesh:
[Sorry for top posting]
Hi Christian
I think you forgot the below hunk, without which the issue is not
fixed completely on a multi GPU system.
No, that is perfectly intentional. While removing a bo_va structure it
can happen that there are still mappings attached to it (for example
because the application crashed).
Because of this locking the VM before the remove is mandatory. Only
while adding a bo_va structure we can avoid that.
Regards,
Christian.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index dcc80d6e099e..6f68fc9da56a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2670,8 +2670,6 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev,
struct amdgpu_vm *vm = bo_va->base.vm;
struct amdgpu_vm_bo_base **base;
- dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
if (bo) {
dma_resv_assert_held(bo->tbo.base.resv);
if (bo->tbo.base.resv == vm->root.bo->tbo.base.resv)
If you chose to include the above hunk, please feel free to add
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@xxxxxxx>
On 2/4/2022 11:27 AM, Felix Kuehling wrote:
Am 2022-02-04 um 03:52 schrieb Christian König:
Since newly added BOs don't have any mappings it's ok to add them
without holding the VM lock. Only when we add per VM BOs the lock is
mandatory.
Signed-off-by: Christian König <christian.koenig@xxxxxxx>
Reported-by: Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@xxxxxxx>
Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdc6a1fd74af..dcc80d6e099e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -375,6 +375,8 @@ static void amdgpu_vm_bo_base_init(struct
amdgpu_vm_bo_base *base,
if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
return;
+ dma_resv_assert_held(vm->root.bo->tbo.base.resv);
+
vm->bulk_moveable = false;
if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)
amdgpu_vm_bo_relocated(base);
@@ -2260,8 +2262,6 @@ struct amdgpu_bo_va *amdgpu_vm_bo_add(struct
amdgpu_device *adev,
{
struct amdgpu_bo_va *bo_va;
- dma_resv_assert_held(vm->root.bo->tbo.base.resv);
-
bo_va = kzalloc(sizeof(struct amdgpu_bo_va), GFP_KERNEL);
if (bo_va == NULL) {
return NULL;