RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only]

 

 

 

From: Zhu, James <James.Zhu@xxxxxxx>
Sent: Monday, December 13, 2021 9:39 PM
To: Gong, Curry <Curry.Gong@xxxxxxx>; Zhu, James <James.Zhu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Liu, Leo <Leo.Liu@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

 

Hi Curry, Evan,

It seems vcn1.0 power gate sequence are special.

if the original solution is thread safe, then the following solution will be thread safe also.

static int vcn_v1_0_hw_fini(void *handle)
{
    struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
    cancel_delayed_work_sync(&adev->vcn.idle_work);
 
    if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
+        if (adev->pm.dpm_enabled)
+            amdgpu_dpm_enable_uvd(adev, false);
+        else
+            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);

[Quan, Evan] Considering adev->pm.dpm_enabled is always true, “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will become dead code.

Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the suspend/resume discussed here). So, it seems quite risky compared with Curry’s original patch.

Before we can come up with a better idea, I would suggest to land Curry’s original patch as a quick fix.

 

BR

Evan


    }

Best Regards!

James

On 2021-12-13 3:55 a.m., Gong, Curry wrote:

[AMD Official Use Only]

 

Hi James:

 

With the following patch, an error will be reported when the driver is loaded

+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)

        else

                r = vcn_v1_0_stop_spg_mode(adev);

 

c

        return r;

}

 

 

$ dmesg

[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.

[  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu

[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000

[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]

[  363.181612] Call Trace:

[  363.181618]  __schedule+0x44c/0x8a0

[  363.181627]  schedule+0x4f/0xc0

[  363.181631]  schedule_preempt_disabled+0xe/0x10

[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0

[  363.181643]  __mutex_lock_slowpath+0x13/0x20

[  363.181648]  mutex_lock+0x32/0x40

[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]

[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]

[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]

[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]

[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]

[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]

[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]

[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]

[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]

[  363.184667]  process_one_work+0x220/0x3c0

[  363.184674]  worker_thread+0x4d/0x3f0

[  363.184677]  ? process_one_work+0x3c0/0x3c0

[  363.184680]  kthread+0x12b/0x150

[  363.184685]  ? set_kthread_struct+0x40/0x40

[  363.184690]  ret_from_fork+0x22/0x30

[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.

[  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu

[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000

[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]

[  363.185085] Call Trace:

[  363.185087]  __schedule+0x44c/0x8a0

[  363.185092]  schedule+0x4f/0xc0

[  363.185095]  schedule_timeout+0x202/0x290

[  363.185099]  ? sched_clock_cpu+0x11/0xb0

[  363.185105]  wait_for_completion+0x94/0x100

[  363.185110]  __flush_work+0x12a/0x1e0

[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0

[  363.185119]  __cancel_work_timer+0x10e/0x190

[  363.185123]  cancel_delayed_work_sync+0x13/0x20

[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]

[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]

[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]

[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]

[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]

[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]

[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]

[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]

[  363.187206]  process_one_work+0x220/0x3c0

[  363.187210]  worker_thread+0x4d/0x3f0

[  363.187214]  ? process_one_work+0x3c0/0x3c0

[  363.187217]  kthread+0x12b/0x150

[  363.187221]  ? set_kthread_struct+0x40/0x40

[  363.187226]  ret_from_fork+0x22/0x30

 

 

BR

Curry Gong

From: Zhu, James <James.Zhu@xxxxxxx>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry <Curry.Gong@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Liu, Leo <Leo.Liu@xxxxxxx>; Zhu, James <James.Zhu@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

 

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:
 
amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
 
[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.
 
When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.
 
When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.
 
[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.
 
Signed-off-by: chen gong <curry.gong@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
 1 file changed, 7 insertions(+)
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
 {
  int r;
  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ bool cancel_success;
+
+ cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.

 
+ if (cancel_success) {
+        if (adev->pm.dpm_enabled)
+                amdgpu_dpm_enable_uvd(adev, false);
+ }
 
  r = vcn_v1_0_hw_fini(adev);
  if (r)

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux