RE: [PATCH v2] drm/amd/display: explicitly update clocks when DC is set to D3 in s0i3

"Limonciello, Mario" <Mario.Limonciello@xxxxxxx> · Wed, 5 Jan 2022 21:39:54 +0000

[Public]

> -----Original Message-----
> From: Wentland, Harry <Harry.Wentland@xxxxxxx>
> Sent: Wednesday, January 5, 2022 15:26
> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; amd-
> gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhuo, Qingqing (Lillian) <Qingqing.Zhuo@xxxxxxx>; Scott Bruce
> <smbruce@xxxxxxxxx>; Chris Hixon <linux-kernel-bugs@xxxxxxxxxxxxx>;
> spasswolf@xxxxxx
> Subject: Re: [PATCH v2] drm/amd/display: explicitly update clocks when DC is set
> to D3 in s0i3
> 
> On 2022-01-05 12:06, Mario Limonciello wrote:
> > The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for
> > hard hang on HPD") causes a regression in s0ix where the system will
> > fail to resume properly.  This may be because an HPD was active the last
> > time clocks were updated but clocks didn't get updated again during s0ix.
> >
> > So add an extra call to update clocks as part of the suspend routine:
> > dm_suspend->dc_set_power_state->clk_mgr_optimize_pwr_state
> >
> > In case HPD is set during this time, also check if the call happened during
> > suspend to allow overriding the WA.
> >
> > Cc: Qingqing Zhuo <qingqing.zhuo@xxxxxxx>
> > Reported-by: Scott Bruce <smbruce@xxxxxxxxx>
> > Reported-by: Chris Hixon <linux-kernel-bugs@xxxxxxxxxxxxx>
> > Reported-by: spasswolf@xxxxxx
> > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215436
> > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1821
> > BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1852
> > Fixes: 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD")
> > Fixes: 1bd3bc745e7f ("drm/amd/display: Extend w/a for hard hang on HPD to
> dcn20")
> > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > ---
> > changes from v1->v2:
> >  * Add fallthrough statement
> >  * Extend case to check if call was explicitly in s0ix since #1852 showed
> hpd_state
> >    can be set at this time too
> >  * Adjust commit message and title
> >  * Add extra commit and bug fixed to metadata
> >  drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 5 ++++-
> >  drivers/gpu/drm/amd/display/dc/core/dc.c                  | 3 +++
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
> b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
> > index fbda42313bfe..fa5efe10b779 100644
> > --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
> > +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
> > @@ -23,6 +23,8 @@
> >   *
> >   */
> >
> > +#include "amdgpu.h"
> > +
> >  #include "dccg.h"
> >  #include "clk_mgr_internal.h"
> >
> > @@ -126,6 +128,7 @@ static void rn_update_clocks(struct clk_mgr
> *clk_mgr_base,
> >  			bool safe_to_lower)
> >  {
> >  	struct clk_mgr_internal *clk_mgr =
> TO_CLK_MGR_INTERNAL(clk_mgr_base);
> > +	struct amdgpu_device *adev = clk_mgr_base->ctx->driver_context;
> 
> DC code is shared on other OSes. driver_context won't be an instance
> of amdgpu_device in those cases.
> 
> If we need adev->in_s0ix we need to find a way to plumb it through
> DC structs and interfaces.
> 

OK.  I'll see what that takes to do.  I was hoping the original version would
have sufficed but Chris (on CC) had reported that hpd_state was still active
When this was called again during the suspend context.

<snip>
Jan 05 15:39:08.562580 kernel: kauditd_printk_skb: 35 callbacks suppressed
Jan 05 15:43:54.575923 kernel: wlo1: deauthenticating from ZZ:ZZ:ZZ:ZZ:ZZ:ZZ by local choice (Reason: 3=DEAUTH_LEAVING)
Jan 05 15:43:54.699223 kernel: amdgpu 0000:04:00.0: amdgpu: [rn_update_clocks] display count: 0, in_s0ix: 0, hpd_state: 0
Jan 05 15:43:55.922674 kernel: amdgpu 0000:04:00.0: amdgpu: [rn_update_clocks] display count: 1, in_s0ix: 0, hpd_state: 1
Jan 05 15:43:55.979221 kernel: rfkill: input handler enabled
Jan 05 15:43:57.712689 kernel: PM: suspend entry (s2idle)
Jan 05 15:44:07.302069 kernel: Filesystems sync: 0.008 seconds
Jan 05 15:44:07.302256 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
Jan 05 15:44:07.302306 kernel: OOM killer disabled.
Jan 05 15:44:07.302344 kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Jan 05 15:44:07.302378 kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jan 05 15:44:07.302416 kernel: amdgpu 0000:04:00.0: amdgpu: [rn_update_clocks] display count: 0, in_s0ix: 1, hpd_state: 1
</snip>

So I suppose the other question to ask though - why is that hpd interrupt still active?  Shouldn't it have been suspended by then?
@Zhuo, Qingqing (Lillian) Maybe some other code is needed to suspend it on the way down?

It was for spasswolf@xxxxxx (which is why I had the simpler version of the patch for v1).

> Harry
> 
> >  	struct dc_clocks *new_clocks = &context->bw_ctx.bw.dcn.clk;
> >  	struct dc *dc = clk_mgr_base->ctx->dc;
> >  	int display_count;
> > @@ -157,7 +160,7 @@ static void rn_update_clocks(struct clk_mgr
> *clk_mgr_base,
> >  			}
> >
> >  			/* if we can go lower, go lower */
> > -			if (display_count == 0 && !hpd_state) {
> > +			if (display_count == 0 && (adev->in_s0ix || !hpd_state))
> {
> >
> 	rn_vbios_smu_set_dcn_low_power_state(clk_mgr,
> DCN_PWR_STATE_LOW_POWER);
> >  				/* update power state */
> >  				clk_mgr_base->clks.pwr_state =
> DCN_PWR_STATE_LOW_POWER;
> > diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> > index 91c4874473d6..8e0c820f6892 100644
> > --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> > +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> > @@ -3299,6 +3299,9 @@ void dc_set_power_state(
> >  		}
> >
> >  		break;
> > +	case DC_ACPI_CM_POWER_STATE_D3:
> > +		clk_mgr_optimize_pwr_state(dc, dc->clk_mgr);
> > +		fallthrough;
> >  	default:
> >  		ASSERT(dc->current_state->stream_count == 0);
> >  		/* Zero out the current context so that on resume we start with