Re: [PATCH v2] amd/amdgpu: Fix resv shared fence overflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Philip already stumbled over this issue as well, but this is the wrong place to fix this.

dma_resv_reserve_shared() needs to be called after we reserved the page tables and before we do the update in amdgpu_vm_handle_fault().

Reserved slots are freed (in a debug build) as soon as we release the reservation.

Christian.

Am 29.09.20 um 07:57 schrieb xinhui pan:
[  179.556745] kernel BUG at drivers/dma-buf/dma-resv.c:282!
[snip]
[  179.702910] Call Trace:
[  179.705696]  amdgpu_bo_fence+0x21/0x50 [amdgpu]
[  179.710707]  amdgpu_vm_sdma_commit+0x299/0x430 [amdgpu]
[  179.716497]  amdgpu_vm_bo_update_mapping.constprop.0+0x29f/0x390 [amdgpu]
[  179.723927]  ? find_held_lock+0x38/0x90
[  179.728183]  amdgpu_vm_handle_fault+0x1af/0x420 [amdgpu]
[  179.734063]  gmc_v9_0_process_interrupt+0x245/0x2e0 [amdgpu]
[  179.740347]  ? kgd2kfd_interrupt+0xb8/0x1e0 [amdgpu]
[  179.745808]  amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
[  179.751380]  ? amdgpu_irq_dispatch+0x10a/0x3c0 [amdgpu]
[  179.757159]  amdgpu_ih_process+0xbb/0x1a0 [amdgpu]
[  179.762466]  amdgpu_irq_handle_ih1+0x27/0x40 [amdgpu]
[  179.767997]  process_one_work+0x23c/0x580
[  179.772371]  worker_thread+0x50/0x3b0
[  179.776356]  ? process_one_work+0x580/0x580
[  179.780939]  kthread+0x128/0x160
[  179.784462]  ? kthread_park+0x90/0x90
[  179.788466]  ret_from_fork+0x1f/0x30

We have two scheduler entities, immediate and delayed.
So there are two kinds of scheduler finished fences.
We might add these two fences in root bo resv at same time while we
only reserve one slot.

Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 37221b99ca96..9e0116c7f8d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2869,7 +2869,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
  	if (r)
  		goto error_free_root;
- r = dma_resv_reserve_shared(root->tbo.base.resv, 1);
+	r = dma_resv_reserve_shared(root->tbo.base.resv, 2);
  	if (r)
  		goto error_unreserve;

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux