That warning is a bit more than a little annoying.
Before we stop the delayed delete worker we *must* absolutely make sure
that there is nothing going on the hardware any more. Otherwise we could
easily run into use after free issues.
There should somewhere be a amdgpu_fence_wait_empty() before the
flush_delayed_work() call. If that isn't there we do have a problem
elsewhere.
Thanks for investigating this,
Christian.
Am 13.04.22 um 09:47 schrieb Pan, Xinhui:
[AMD Official Use Only]
The log from tester says it is the drm framebuffer BO being busy.
I just feel there is lack of time for its fence to be signaled.
As a delay works too in my test.
But the warning is a little annoying.
________________________________________
发件人: Koenig, Christian <Christian.Koenig@xxxxxxx>
发送时间: 2022年4月13日 15:30
收件人: Pan, Xinhui; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送: Deucher, Alexander
主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
We don't need that.
TTM only reschedules when the BOs are still busy.
And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)
Regards,
Christian
________________________________
Von: Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.
Fix it by waiting all BO to be destroyed.
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
*/
void amdgpu_device_fini_hw(struct amdgpu_device *adev)
{
+ int pending = 1;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
- if (adev->mman.initialized) {
+ while (adev->mman.initialized && pending) {
flush_delayed_work(&adev->mman.bdev.wq);
- ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ if (pending) {
+ ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+ msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+ }
}
adev->shutdown = true;
--
2.25.1