We don't need that.
TTM only reschedules when the BOs are still busy.
And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)
Regards,
Christian
Von: Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.
Fix it by waiting all BO to be destroyed.
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
*/
void amdgpu_device_fini_hw(struct amdgpu_device *adev)
{
+ int pending = 1;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
- if (adev->mman.initialized) {
+ while (adev->mman.initialized && pending) {
flush_delayed_work(&adev->mman.bdev.wq);
- ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ if (pending) {
+ ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+ msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+ }
}
adev->shutdown = true;
--
2.25.1
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.
Fix it by waiting all BO to be destroyed.
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
*/
void amdgpu_device_fini_hw(struct amdgpu_device *adev)
{
+ int pending = 1;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
- if (adev->mman.initialized) {
+ while (adev->mman.initialized && pending) {
flush_delayed_work(&adev->mman.bdev.wq);
- ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ if (pending) {
+ ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+ msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+ }
}
adev->shutdown = true;
--
2.25.1