This is a debugfs interface and adding another writer contention in
debugfs over an actual reset is lazy fix. This shouldn't be executed in
the first place and should not take precedence over any reset.
Thanks,
Lijo
On 9/13/2021 11:52 AM, Christian König wrote:
NAK, this is not the lazy way to fix it at all.
The reset semaphore protects the scheduler and ring objects from
concurrent modification, so taking the write side of it is perfectly
valid here.
Christian.
Am 13.09.21 um 06:42 schrieb Pan, Xinhui:
[AMD Official Use Only]
yep, that is a lazy way to fix it.
I am thinking of adding one amdgpu_ring.direct_access_mutex before we
issue test_ib on each ring.
________________________________________
发件人: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
发送时间: 2021年9月13日 12:00
收件人: Pan, Xinhui; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH v2] drm/amdgpu: Fix a race of IB test
On 9/13/2021 5:18 AM, xinhui pan wrote:
Direct IB submission should be exclusive. So use write lock.
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..be5d12ed3db1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,7 +1358,7 @@ static int amdgpu_debugfs_test_ib_show(struct
seq_file *m, void *unused)
}
/* Avoid accidently unparking the sched thread during GPU
reset */
- r = down_read_killable(&adev->reset_sem);
+ r = down_write_killable(&adev->reset_sem);
There are many ioctls and debugfs calls which takes this lock and as you
know the purpose is to avoid them while there is a reset. The purpose is
*not to* fix any concurrency issues those calls themselves have
otherwise and fixing those concurrency issues this way is just lazy and
not acceptable.
This will take away any fairness given to the writer in this rw lock and
that is supposed to be the reset thread.
Thanks,
Lijo
if (r)
return r;
@@ -1387,7 +1387,7 @@ static int amdgpu_debugfs_test_ib_show(struct
seq_file *m, void *unused)
kthread_unpark(ring->sched.thread);
}
- up_read(&adev->reset_sem);
+ up_write(&adev->reset_sem);
pm_runtime_mark_last_busy(dev->dev);
pm_runtime_put_autosuspend(dev->dev);