[AMD Official Use Only - Internal Distribution Only] >-----Original Message----- >From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >Sent: Friday, March 5, 2021 4:52 PM >To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues > > > >Am 05.03.21 um 09:43 schrieb Deng, Emily: >> [AMD Official Use Only - Internal Distribution Only] >> >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >>> Sent: Friday, March 5, 2021 3:55 PM >>> To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues >>> >>> Am 05.03.21 um 02:20 schrieb Emily Deng: >>>> When unloading driver after killing some applications, it will hit >>>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete. >>>> So to avoid the job submit after fence driver fini, call >>>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put >>> drm_sched_fini before waiting fence. >>> >>> Good catch, Reviewed-by: Christian König <christian.koenig@xxxxxxx> >>> for this part. >>> >>>> Set adev->gart.ptr to null to fix null pointer when calling >>>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini. >>> For that one I'm wondering if we shouldn't rather rework or double >>> check the tear down order. >>> >>> On the other hand the hardware should be idle by now (this comes >>> after the fence_driver_fini, doesn't it?) and it looks cleaner on it's own. >> Yes, you are right, without consider memory leak, with above patch, the bo >are all cleaned, then won't have issue. But if have memory leak, maybe it will >have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first? > >Yeah, that is a good argument and part of what I mean with it looks cleaner on >its own. > >Maybe write that into the commit message like this. With that done the full >patch has my rb. > >Regards, >Christian. Ok, thanks. > >> >>> Regards, >>> Christian. >>> >>>> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 5 +++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 1 + >>>> 3 files changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index a11760ec3924..de0597d34588 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device >>> *adev) >>>> { >>>> dev_info(adev->dev, "amdgpu: finishing device.\n"); >>>> flush_delayed_work(&adev->delayed_init_work); >>>> +ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); >>>> adev->shutdown = true; >>>> >>>> kfree(adev->pci_state); >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> index 143a14f4866f..6d16f58ac91e 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct >>> amdgpu_device >>>> *adev) >>>> >>>> if (!ring || !ring->fence_drv.initialized) >>>> continue; >>>> +if (!ring->no_scheduler) >>>> +drm_sched_fini(&ring->sched); >>>> r = amdgpu_fence_wait_empty(ring); >>>> if (r) { >>>> /* no need to trigger GPU reset as we are unloading */ >>> @@ -539,8 >>>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev) >>>> if (ring->fence_drv.irq_src) >>>> amdgpu_irq_put(adev, ring->fence_drv.irq_src, >>>> ring->fence_drv.irq_type); -if (!ring->no_scheduler) >>>> -drm_sched_fini(&ring->sched); >>>> + >>>> del_timer_sync(&ring->fence_drv.fallback_timer); >>>> for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) >>>> dma_fence_put(ring->fence_drv.fences[j]); >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c >>>> index 23823a57374f..f1ede4b43d07 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c >>>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct >>> amdgpu_device *adev) >>>> return; >>>> } >>>> amdgpu_bo_unref(&adev->gart.bo); >>>> +adev->gart.ptr = NULL; >>>> } >>>> >>>> /* _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx