RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]

 

Checked the log paste below with Curry. The way to add this fix in vcn_v1_0_stop is not workable.

As it will induce a circle calling(below) and lead to dead lock.

VCN ring begin use -> amdgpu_dpm_enable_uvd -> acquire the smu_lock -> smu10_powergate_vcn -> amdgpu_device_ip_set_powergating_state -> vcn_v1_0_stop -> amdgpu_dpm_enable_uvd -> try to acquire the smu_lock again -> dead lock

 

BR

Evan

From: Gong, Curry <Curry.Gong@xxxxxxx>
Sent: Monday, December 13, 2021 4:56 PM
To: Zhu, James <James.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Liu, Leo <Leo.Liu@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

 

[AMD Official Use Only]

 

Hi James:

 

With the following patch, an error will be reported when the driver is loaded

+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)

        else

                r = vcn_v1_0_stop_spg_mode(adev);

 

+       if (adev->pm.dpm_enabled)

+               amdgpu_dpm_enable_uvd(adev, false);

+

        return r;

}

 

 

$ dmesg

[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.

[  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu

[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000

[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]

[  363.181612] Call Trace:

[  363.181618]  __schedule+0x44c/0x8a0

[  363.181627]  schedule+0x4f/0xc0

[  363.181631]  schedule_preempt_disabled+0xe/0x10

[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0

[  363.181643]  __mutex_lock_slowpath+0x13/0x20

[  363.181648]  mutex_lock+0x32/0x40

[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]

[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]

[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]

[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]

[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]

[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]

[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]

[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]

[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]

[  363.184667]  process_one_work+0x220/0x3c0

[  363.184674]  worker_thread+0x4d/0x3f0

[  363.184677]  ? process_one_work+0x3c0/0x3c0

[  363.184680]  kthread+0x12b/0x150

[  363.184685]  ? set_kthread_struct+0x40/0x40

[  363.184690]  ret_from_fork+0x22/0x30

[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.

[  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu

[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000

[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]

[  363.185085] Call Trace:

[  363.185087]  __schedule+0x44c/0x8a0

[  363.185092]  schedule+0x4f/0xc0

[  363.185095]  schedule_timeout+0x202/0x290

[  363.185099]  ? sched_clock_cpu+0x11/0xb0

[  363.185105]  wait_for_completion+0x94/0x100

[  363.185110]  __flush_work+0x12a/0x1e0

[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0

[  363.185119]  __cancel_work_timer+0x10e/0x190

[  363.185123]  cancel_delayed_work_sync+0x13/0x20

[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]

[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]

[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]

[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]

[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]

[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]

[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]

[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]

[  363.187206]  process_one_work+0x220/0x3c0

[  363.187210]  worker_thread+0x4d/0x3f0

[  363.187214]  ? process_one_work+0x3c0/0x3c0

[  363.187217]  kthread+0x12b/0x150

[  363.187221]  ? set_kthread_struct+0x40/0x40

[  363.187226]  ret_from_fork+0x22/0x30

 

 

BR

Curry Gong

From: Zhu, James <James.Zhu@xxxxxxx>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry <Curry.Gong@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Liu, Leo <Leo.Liu@xxxxxxx>; Zhu, James <James.Zhu@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

 

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:
 
amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
 
[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.
 
When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.
 
When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.
 
[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.
 
Signed-off-by: chen gong <curry.gong@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
 1 file changed, 7 insertions(+)
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
 {
  int r;
  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ bool cancel_success;
+
+ cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.

 
+ if (cancel_success) {
+        if (adev->pm.dpm_enabled)
+                amdgpu_dpm_enable_uvd(adev, false);
+ }
 
  r = vcn_v1_0_hw_fini(adev);
  if (r)

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux