Although radeon card fence and wait for gpu to finish processing current batch rings,there is still a corner case that radeon lockup work queue may not be fully flushed,and meanwhile the radeon_suspend_kms() function has called pci_set_power_state() toput device in D3hot state.
Per PCI spec rev 4.0 on 5.3.1.4.1 D3hot State.> Configuration and Message requests are the only TLPs accepted by a Function in> the D3hot state. All other received Requests must be handled as Unsupported Requests,> and all received Completions may optionally be handled as Unexpected Completions.
Well first of all this is the completely wrong place for this. The flush belongs into the fence code and not here.
Then I don't think that this is a good idea since it might cause deadlocks.
Christian.
This issue will happen in following logs:
1Unable to handle kernel paging request at virtual address 00008800e0008010CPU 0 kworker/0:3(131): Oops 0pc = [<ffffffff811bea5c>] ra = [<ffffffff81240844>] ps = 0000 Tainted: G Wpc is at si_gpu_check_soft_reset+0x3c/0x240ra is at si_dma_is_lockup+0x34/0xd0v0 = 0000000000000000 t0 = fff08800e0008010 t1 = 0000000000010000t2 = 0000000000008010 t3 = fff00007e3c00000 t4 = fff00007e3c00258t5 = 000000000000ffff t6 = 0000000000000001 t7 = fff00007ef078000s0 = fff00007e3c016e8 s1 = fff00007e3c00000 s2 = fff00007e3c00018s3 = fff00007e3c00000 s4 = fff00007fff59d80 s5 = 0000000000000000s6 = fff00007ef07bd98a0 = fff00007e3c00000 a1 = fff00007e3c016e8 a2 = 0000000000000008a3 = 0000000000000001 a4 = 8f5c28f5c28f5c29 a5 = ffffffff810f4338t8 = 0000000000000275 t9 = ffffffff809b66f8 t10 = ff6769c5d964b800t11= 000000000000b886 pv = ffffffff811bea20 at = 0000000000000000gp = ffffffff81d89690 sp = 00000000aa8141264Disabling lock debugging due to kernel taintTrace:[<ffffffff81240844>] si_dma_is_lockup+0x34/0xd0[<ffffffff81119610>] radeon_fence_check_lockup+0xd0/0x290[<ffffffff80977010>] process_one_work+0x280/0x550[<ffffffff80977350>] worker_thread+0x70/0x7c0[<ffffffff80977410>] worker_thread+0x130/0x7c0[<ffffffff80982040>] kthread+0x200/0x210[<ffffffff809772e0>] worker_thread+0x0/0x7c0[<ffffffff80981f8c>] kthread+0x14c/0x210[<ffffffff80911658>] ret_from_kernel_thread+0x18/0x20[<ffffffff80981e40>] kthread+0x0/0x210
Code: ad3e0008 43f0074a ad7e0018 ad9e0020 8c3001e8 40230101<88210000> 4821ed21
So force lockup work queue flush to fix this problem.
Reviewed-by: Su Weiqiang <suweiqiang@xxxxxxxxx>Reviewed-by: Zhou Xuemei <zhouxuemei@xxxxxxxxx>Signed-off-by: Xu Chenjiao <xuchenjiao@xxxxxxxxx>---drivers/gpu/drm/radeon/radeon_device.c | 3 +++1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.cindex 59c8a6647ff2..cc1c07963116 100644--- a/drivers/gpu/drm/radeon/radeon_device.c+++ b/drivers/gpu/drm/radeon/radeon_device.c@@ -1625,6 +1625,9 @@ int radeon_suspend_kms(struct drm_device *dev, bool suspend,if (r) {/* delay GPU reset to resume */radeon_fence_driver_force_completion(rdev, i);+ } else {+ /* finish executing delayed work */+ flush_delayed_work(&rdev->fence_drv[i].lockup_work);}}--2.17.1