Re: 回复: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, because we avoid need to allocate an entity otherwise.

Ok, all that comes swapped back into my head once more.

As far as I can see that should work, but I would ask Andrey as well since he now takes care of GPU reset.

Christian.

Am 13.09.21 um 08:55 schrieb Pan, Xinhui:
[AMD Official Use Only]

These IB tests are all using direct IB submission including the delayed init work.
________________________________________
发件人: Koenig, Christian <Christian.Koenig@xxxxxxx>
发送时间: 2021年9月13日 14:19
收件人: Pan, Xinhui; Christian König; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送: Deucher, Alexander
主题: Re: 回复: [PATCH] drm/amdgpu: Fix a race of IB test

Well is the delayed init work using direct submission or submission
through the scheduler?

If the later we have the down_write of the reset semaphore pulled in
through the scheduler dependency.

Anyway just having the sync before taking the lock should work.

Christian.

Am 11.09.21 um 12:18 schrieb Pan, Xinhui:
[AMD Official Use Only]

For the possible deadlock, we can just move flush_delayed_work above down_write. not a big thing.
But I am not aware why delayed init work would try to lock reset_sem.

delayed init work is enqueued when device resume. It calls amdgpu_ib_ring_tests directly. We need one sync method.
But I see device resume itself woud flush it. So there is no race between them as userspace is still freezed.

I will drop this flush in V2.
________________________________________
发件人: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
发送时间: 2021年9月11日 15:45
收件人: Pan, Xinhui; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送: Deucher, Alexander; Koenig, Christian
主题: Re: [PATCH] drm/amdgpu: Fix a race of IB test



Am 11.09.21 um 03:55 schrieb xinhui pan:
Direct IB submission should be exclusive. So use write lock.

Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 9 +++++++--
    1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 19323b4cce7b..acbe02928791 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1358,10 +1358,15 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
        }

        /* Avoid accidently unparking the sched thread during GPU reset */
-     r = down_read_killable(&adev->reset_sem);
+     r = down_write_killable(&adev->reset_sem);
        if (r)
                return r;

+     /* Avoid concurrently IB test but not cancel it as I don't know whether we
+      * would add more code in the delayed init work.
+      */
+     flush_delayed_work(&adev->delayed_init_work);
+
That won't work. It's at least theoretical possible that the delayed
init work waits for the reset_sem which we are holding here.

Very unlikely to happen, but lockdep might be able to point that out
with a nice backtrace in the logs.

On the other hand delayed init work and direct IB test through this
interface should work at the same time, so I would just drop it.

Christian.

        /* hold on the scheduler */
        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
                struct amdgpu_ring *ring = adev->rings[i];
@@ -1387,7 +1392,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
                kthread_unpark(ring->sched.thread);
        }

-     up_read(&adev->reset_sem);
+     up_write(&adev->reset_sem);

        pm_runtime_mark_last_busy(dev->dev);
        pm_runtime_put_autosuspend(dev->dev);




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux