Dear xinhui,
Am 13.04.22 um 08:46 schrieb xinhui pan:
ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.
warning
(`scripts/checkpatch.pl --codespell` should have found this.)
Fix it by waiting all BO to be destroyed.
Please explain the waiting algorithm. Why at least one millisecond?
Do you have a reproducer for this issue?
Acked-by: Guchun Chen <guchun.chen@xxxxxxx>
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..56dcf8c3a3cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
*/
void amdgpu_device_fini_hw(struct amdgpu_device *adev)
{
+ int pending = 1;
Could this be bool?
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
- if (adev->mman.initialized) {
+ while (adev->mman.initialized && pending) {
flush_delayed_work(&adev->mman.bdev.wq);
- ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+ if (pending) {
+ ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
Does this call affect `adev->mman.initialized`? If not a do-while loop
might be better suited, so pending is only checked once.
if (adev->mman.initialized) {
do {
flush_delayed_work(&adev->mman.bdev.wq);
ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
msleep(((HZ / 100) < 1) ? 1 : HZ / 100);
} while (ttm_bo_lock_delayed_workqueue(&adev->mman.bdev));
}
The logic is slightly different, as
`ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);` and the sleep
are run at least once, so the suggestion might be unsuitable.
+ msleep(((HZ / 100) < 1) ? 1 : HZ / 100);
Maybe add a comment for that formula? (No idea, if common knowledge, but
why is a delay needed, and the loop cannot be run as fast as possible?)
+ }
}
adev->shutdown = true;
Kind regards,
Paul