On 8/18/2021 2:15 PM, Quan, Evan wrote:
[AMD Official Use Only]
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Tuesday, August 17, 2021 6:09 PM
To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Liu,
Leo <Leo.Liu@xxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun
<Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12 UVD/VCE
on suspend
On 8/17/2021 3:13 PM, Quan, Evan wrote:
[AMD Official Use Only]
+Leo to share his insights
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Sent: Tuesday, August 17, 2021 3:28 PM
To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Chen, Guchun
<Guchun.Chen@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: properly powergate Polaris12
UVD/VCE
on suspend
On 8/17/2021 12:10 PM, Evan Quan wrote:
If the powergating of UVD/VCE is in process, wait for its completion
before proceeding(suspending). This can fix some hangs observed on
suspending when UVD/VCE still using(e.g. issue "pm-suspend" when
video is still playing).
Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed
Signed-off-by: Evan Quan <evan.quan@xxxxxxx>
Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 5 +++++
drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 4eebf973a065..2fdce572baeb 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -554,6 +554,11 @@ static int uvd_v6_0_suspend(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ /*
+ * If the powergating is in process, wait for its completion.
+ */
+ flush_delayed_work(&adev->uvd.idle_work);
+
If running idle is a prerequisite before going to suspend, then
something else is missing here.
Otherwise, the hang looks more like a pending work launched after
hardware is suspended and trying to access hardware. As the hardware
is going to be suspended anyway, doesn't it work with
cancel_delayed_work_sync - making sure that nothing is going to be
launched later to access hardware?
[Quan, Evan] The reason we chose flush_delayed_work instead of
cancel_delayed_work_sync is we think those operations performed in
idle_work(dpm disablement, powergating) seems needed considering the
action is 'suspend'. So, instead of "cancel", maybe waiting for them
completion is more proper.
But it will do so only if the work is scheduled - so it doesn't seem to be a
prerequisite for suspend. If it was a prerequisite, then the existing code is
missing that (so that it gets done for all cases).
[Quan, Evan] Just confirmed that cancel_delayed_work_sync() alone cannot work.
In fact, our current driver already get cancel_delayed_work_sync() called(e.g. in amdgpu_uvd_suspend() on the path of uvd_v6_0_suspend).
Thanks Evan for checking this further.
If cancel_delayed_work_sync() is called in amdgpu_uvd_suspend(), then it
means that originally executing idle_work was not considered as a
prerequisite for suspending.
_uvd_suspend() is called "after" uvd_v6_0_hw_fini(adev), that maybe a
little too late.
To get the issue fixed, it has to be either:
1. flush_delayed_work()
Or
2. cancel_delayed_work_sync + amdgpu_dpm_enable_uvd/vce(adev, false) (the job performed in idle work)
At minimum, it proves that disabling dpm is required for suspend.
Btw, I do not think flush_delayed_work() is an incomplete fix. Since the UVD/VCE idle work is appended on ring->funcs->end_use.
So, as long as the UVD/VCE ring used and ended(it will since at least there is ring/ib test), there will be a chance to get the work waited and completed.
I agree that it fixes the issue, only thing is someone should look
further to provide the right sequence of uvd suspend.
It doesn't give a root cause/right sequence - it only tells that forcing
to execute idle_work fixes the problem probably due to an extra delay
added by increased execution time or it disables DPM or something else.
Someone should confirm that all of idle_work or a part of it (as dpm
disable) is required for proper suspend sequence.
That said, I don't have any objections to this fix either. If there are
other things, probably fix it this way and move on.
Thanks,
Lijo
BR
Evan
Then this may be a potential issue for other suspend calls also where
work is pending to be launched when hardware is suspended.
[Quan, Evan] Do you mean we need to check whether there is similar issue
for other IPs?
Yes, if there are cases where other IPs may schedule a delayed work and call
hw_fini without cancelling the work.
Thanks,
Lijo
BR
Evan
Thanks,
Lijo
r = uvd_v6_0_hw_fini(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 6d9108fa22e0..f0adecd5ec0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -503,6 +503,11 @@ static int vce_v3_0_suspend(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ /*
+ * If the powergating is in process, wait for its completion.
+ */
+ flush_delayed_work(&adev->vce.idle_work);
+
r = vce_v3_0_hw_fini(adev);
if (r)
return r;