[AMD Official Use Only] Hi Michel, The patch seems reasonable to me(especially the cancel_delayed_work_sync() part). However, can you explain more about the code below? What's the race issue here exactly? + /* mutex_lock could deadlock with cancel_delayed_work_sync in amdgpu_gfx_off_ctrl. */ + if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) { + /* If there's a bug which causes amdgpu_gfx_off_ctrl to be called with enable=true + * when adev->gfx.gfx_off_req_count is already 0, we might race with that. + * Re-schedule to make sure gfx off will be re-enabled in the HW eventually. + */ + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, AMDGPU_GFX_OFF_DELAY_ENABLE); + return; + } BR Evan > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Michel Dänzer > Sent: Friday, August 13, 2021 6:29 PM > To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian > <Christian.Koenig@xxxxxxx> > Cc: Liu, Leo <Leo.Liu@xxxxxxx>; Zhu, James <James.Zhu@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx > Subject: [PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is > disabled > > From: Michel Dänzer <mdaenzer@xxxxxxxxxx> > > schedule_delayed_work does not push back the work if it was already > scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms > after the first time GFXOFF was disabled and re-enabled, even if GFXOFF > was disabled and re-enabled again during those 100 ms. > > This resulted in frame drops / stutter with the upcoming mutter 41 > release on Navi 14, due to constantly enabling GFXOFF in the HW and > disabling it again (for getting the GPU clock counter). > > To fix this, call cancel_delayed_work_sync when GFXOFF transitions from > enabled to disabled. This makes sure the delayed work will be scheduled > as intended in the reverse case. > > In order to avoid a deadlock, amdgpu_device_delay_enable_gfx_off needs > to use mutex_trylock instead of mutex_lock. > > v2: > * Use cancel_delayed_work_sync & mutex_trylock instead of > mod_delayed_work. > > Signed-off-by: Michel Dänzer <mdaenzer@xxxxxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 +++++++------ > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 3 +++ > 3 files changed, 20 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index f3fd5ec710b6..8b025f70706c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2777,7 +2777,16 @@ static void > amdgpu_device_delay_enable_gfx_off(struct work_struct *work) > struct amdgpu_device *adev = > container_of(work, struct amdgpu_device, > gfx.gfx_off_delay_work.work); > > - mutex_lock(&adev->gfx.gfx_off_mutex); > + /* mutex_lock could deadlock with cancel_delayed_work_sync in > amdgpu_gfx_off_ctrl. */ > + if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) { > + /* If there's a bug which causes amdgpu_gfx_off_ctrl to be > called with enable=true > + * when adev->gfx.gfx_off_req_count is already 0, we might > race with that. > + * Re-schedule to make sure gfx off will be re-enabled in the > HW eventually. > + */ > + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, > AMDGPU_GFX_OFF_DELAY_ENABLE); > + return; > + } > + > if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { > if (!amdgpu_dpm_set_powergating_by_smu(adev, > AMD_IP_BLOCK_TYPE_GFX, true)) > adev->gfx.gfx_off_state = true; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > index a0be0772c8b3..da4c46db3093 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c > @@ -28,9 +28,6 @@ > #include "amdgpu_rlc.h" > #include "amdgpu_ras.h" > > -/* delay 0.1 second to enable gfx off feature */ > -#define GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100) > - > /* > * GPU GFX IP block helpers function. > */ > @@ -569,9 +566,13 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device > *adev, bool enable) > adev->gfx.gfx_off_req_count--; > > if (enable && !adev->gfx.gfx_off_state && !adev- > >gfx.gfx_off_req_count) { > - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, > GFX_OFF_DELAY_ENABLE); > - } else if (!enable && adev->gfx.gfx_off_state) { > - if (!amdgpu_dpm_set_powergating_by_smu(adev, > AMD_IP_BLOCK_TYPE_GFX, false)) { > + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, > AMDGPU_GFX_OFF_DELAY_ENABLE); > + } else if (!enable) { > + if (adev->gfx.gfx_off_req_count == 1 && !adev- > >gfx.gfx_off_state) > + cancel_delayed_work_sync(&adev- > >gfx.gfx_off_delay_work); > + > + if (adev->gfx.gfx_off_state && > + !amdgpu_dpm_set_powergating_by_smu(adev, > AMD_IP_BLOCK_TYPE_GFX, false)) { > adev->gfx.gfx_off_state = false; > > if (adev->gfx.funcs->init_spm_golden) { > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > index d43fe2ed8116..dcdb505bb7f4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > @@ -32,6 +32,9 @@ > #include "amdgpu_rlc.h" > #include "soc15.h" > > +/* delay 0.1 second to enable gfx off feature */ > +#define AMDGPU_GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100) > + > /* GFX current status */ > #define AMDGPU_GFX_NORMAL_MODE 0x00000000L > #define AMDGPU_GFX_SAFE_MODE 0x00000001L > -- > 2.32.0